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PREFACE 

The Seventh Copper Mountain Conference on Multigrid Methods was held on April 
2-7, 1995, at Copper Mountain, Colorado, and was sponsored by NASA and the 
Department of Energy. The University of Colorado, Front Range Scientific Computations, 
Inc., and the Society for Industrial and Applied Mathematics provided organizational 
support for the conference. 

This document is a collection of many of the papers that were presented at the con¬ 
ference and thus represents the conference proceedings. NASA Langley has graciously 
provided printing of this book so that all of the papers could be presented in a single 
forum. Each paper was reviewed by a member of the conference organizing committee 
under the coordination of the editors. 

The multigrid discipline continues to expand and mature, as is evident from these 
proceedings. The vibrancy and diversity in this field are amply expressed in these 
important papers, and the collection clearly shows the continuing rapid growth of the 
use of multigrid acceleration techniques. 
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MULTIGRID HISTORY 

(At the awards ceremony of the conference, Achi Brandt presented the following 
history of multigrid. The reader should study the truths contained herein and revel in 
the humor.) 

The early history of multigrid has recently become a hot subject of research. An 
ancient multigrid code was uncovered during extensive excavations last year in northern 
Turkestan. Carbon tests indicate that this code has an efficiency of 5.1 on the Richter 
scale. Some researchers believe that the V cycle was practiced by the Neanderthals. 
The use of the Full Multigrid (FMG) algorithm was, however, unique to Homo sapiens and 
is one of the major reasons for their ultimate survival. Prototypes of two-grid algorithms 
predate the first hominids. Most historians agree that coarsening was, in fact, invented 
by the dinosaurs; however, coarse-to-fine grid transfers were unknown to them, which 
explains their extinction. 

Earlier geological findings include rich multilevel deposits that have been unearthed 
in several North American gold mines, and thick layers of old multigridders have been 
discovered at Copper Mountain. 

The artifacts at the northern Turkestan site indicate that an early form of residual 
weighting was already in widespread use before the middle Full Approximation Storage 
(FAS) period. When Copernicus first introduced line relaxation, it was banned by the 
Catholic church. Pope Pointus the Square decreed that mere mortals should not practice 
such nonlocal schemes. He feared this practice would lead humanity to incompleteness, 
in particular to the incomplete LU decomposition of the Dutch church. The advent of 
variational coarsening during the French Revolution marks the dawn of the modern era, 
which is quite familiar to us all. 
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A PRESSURE BASED MULTIGRID PROCEDURE FOR THE 
NAVIER-STOKES EQUATIONS ON UNSTRUCTURED GRIDS 

R. Jyotsna and S. P. Vanka 
Department of Mechanical and Industrial Engineering 
University of Illinois at Urbana-Champaign, Urbana, IL. 61801 

ABSTRACT 

We present details and performance of a pressure based multigrid solution procedure for the 
Navier-Stokes equations discretized on triangular grids. The discretization uses a control volume 
methodology, with linear inter-nodal variation of the flow variables. The use of the multigrid 
technique provides rapid and grid-independent rates of convergence. Three model driven cavity 
flows are computed, and the performance of the method at several grid densities and Reynolds 
numbers is reported. Representative flow fields characterizing the viscous eddies are also 
presented. 

1. INTRODUCTION 

The multigrid technique [1] provides an efficient means of smoothing high and low 
frequency errors that arise during the iterative solution of elliptic equations. Multigrid acceleration 
of solution procedures on unstructured meshes has been demonstrated earlier for single elliptic 
equations [2,3], for Euler equations [4-7], and for the compressible Navier-Stokes equations [8]. 
These procedures have used complete remeshing to generate a sequence of independent coarse and 
fine grids. Because of the independence of the grids, inter-grid transfers are somewhat 
complicated. Another strategy to coarsen a given fine grid is 'volume agglomeration', where the 
fine grid control volumes are progressively combined to obtain coarser control volumes. The 
resulting coarse grid volumes in this procedure do not have the same shapes as those of the finest 
grid, thus requiring special practices for constructing the discrete operators. The volume 
agglomeration technique is reviewed in reference [6]. 

The present paper describes a pressure based multigrid calculation procedure for unstructured 
grids. The discretization scheme is based on a control volume integration of the governing 
equations analogous to the practices followed in references [9-12]. On any given grid, the solution 
procedure employs a decoupled relaxation in conjunction with a pressure equation obtained 
through combination of the continuity and momentum equations in a special way [10]. In contrast 
with the coupled multigrid procedure followed in Vanka [13], and recently in Webster [14], the 
decoupled solution procedure is simpler to implement, and is better suited for use with a variety of 
linear solvers. In this paper, we discuss the details of the multigrid implementation, and its 
performance in three model driven-cavity flows. We have considered as examples, flows in a 
square cavity, a triangular cavity, and a semicircular cavity. The flow domain is discretized by 
Delaunay triangulation [15], with the fine grid obtained by uniform refinement of each triangle. In 
the following sections, we first describe the single grid procedure and its performance at increasing 
refinements of the mesh. Next, we describe the details of the components of the multigrid 
procedure (coarse grid equations, restriction, prolongation). The performance of the procedure in 
the three configurations at increasing Reynolds numbers is next presented along with brief 
descriptions of the flow fields. 

2. GOVERNING EQUATIONS AND DISCRETIZATION PROCEDURE 

Currently, we consider only the Navier-Stokes equations governing a two-dimensional, 
steady, incompressible flow of constant fluid properties. Thus the equations that are solved can be 
written in primitive variables (u, v, p) as 


409 



V • (u u) = - (3p / 0x ) + 

vV*(Vu) + By 

(1) 

V • (u V ) = - (0p /dy) + 

V V • ( V V ) + By 

(2) 

V • u = 0 


(3) 


Here u and v are the two components of the velocity vector u, and p is the pressure divided by the 
density; v is the kinematic viscosity, and and By provide a means to include other forces such 
as those due to gravity and rotation. 

The above equations are discretized on a triangular mesh shown in Figure 1(a). We use a 
control volume procedure essentially the same as that described in Prakash and Patankar [10], 
except that we have preferred to retain the central differencing scheme. In Prakash and Patankar 
[10] and related works, an exponential variation was introduced for stability at high cell Peclet 
numbers. Such a differencing scheme, although it provides stability, reduces the accuracy to first 
order, and is not satisfactory. Currently we have refined the finest mesh, until the cell Peclet 
number decreases below the stable value. Thus for a given grid, there exists a maximum flow 
Reynolds number that cannot be exceeded. 

Figure 1(a) shows the control volume constructed around a representative node P, by joining 
the centroids of the relevant triangles to the midpoints of the sides. The equations are integrated 
over each of these control volumes to obtain nodal values of pressure and velocity. The checker¬ 
board split in the pressure field that arises in such equal-order interpolation is avoided, by requiring 
a different set of velocities (u, v), located at the cell interfaces, to satisfy mass continuity. This 
practice is similar to the momentum interpolation concept used in collocated finite volume schemes 
[16-18]. 

The Momentum Balances 

Integrating equation (1) over the discrete control volume ABCDEF and using the divergence 
theorem, we have 

3 J[(uu - vVu)-n]dS = ,^l(Bu - |£)dV (4) 

where S is the enclosing surface of control volume V. 

Consider now element PAB (Figure 1(b)), which has two faces ajc and ca 3 bounding the 

control volume around P. The contributions from these two surfaces to the flux balance can be 
written as 


c a 3 

{J„-n)dS + J (J„-n)dS - (5) 

where = u u - v V u 

To compute the flux J^, we use a linear interpolation of velocities between the nodes of PAB. 

Pressure is also assumed to vary linearly. Further, it is convenient to integrate the flux terms in 
local coordinates (X, Y), defined with the origin at the centroid of the element. The components of 
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are then expressed in terms of the nodal values of u because of the linear interpolation used. 

Using Simpson's rule to evaluate the integrals, it can be shown that after collecting like terms and 
simplifying the complete equation, the resulting equation has the form 

Ap Up = Ujjb - Vp < By - ^>p (6) 

where up is the value of u at point P and Ujj^ represents values at the neighboring nodes A, B, C, 
D, E and F. Vp is the area of the control volume around P, and <> is an average defined by 

<B> = (Wp)Ie[(Ai/3)Bj] (7) 

where Aj is the area of element i around P, and Eg denotes summation over all the elements 
contributing to Vp. The expressions for the coefficients are not provided here, but can be derived 

by the above mentioned steps. Following the same procedure for equation (2), we can obtain the 
discretized y-momentum balance as 


Ap Vp = lAjjb Vnb - Vp < By - |^>p (8) 

It is convenient to define momentum velocities u and v as 

u = ( SAjjb Ujjb )! ■'^P’ ^ ( 2^^nb '^nb ^ ^ ■^P 

so that 

u = u + Vp< > / Ap and v = v + Vp< B^ - |^> / Ap (10) 

The Continuity Equation 

In the present procedure, u and v located at the nodal points do not satisfy the continuity 
equation. Rather, the cell face fluxes are balanced for each control volume. These cell face fluxes 
are interpolants of the nodal values in a special way that preserves the connections between the 
nodal pressures. The practice is similar to the momentum interpolation scheme used in finite 
volume schemes with a collocated arrangement of velocities and pressure [16-18]. 

We define a new set of velocities u and v, located at the interfaces, and related to u and v 
by 


Q =. u + D (B,,-) and v = v + DCB^-^) (11) 

“ 3x dy 

where D = Vp / Ap. The pressure gradients in equations (11) are evaluated locally for each 
element. The discrete continuity equation is obtained from 

V • S = 0 (3) 

written as 

gj ( ii ’ n ) dS =0 (12) 
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The values of D at points within the element are linearly interpolated from the nodal values. The 
pressure gradients (9p/3x) and (8p/3y) are now local at the cell faces, and can be related to the 

nodal pressures (pp, p^, pg ) because of the linear interpolation used. If the equations for |2- and 

^ are substituted in the two interface flux relations, the contributions from element PAB to the 
3y 

continuity at node P are obtained. Similar contributions from all elements surrounding P 
then provide a pressure equation at P given by 

APppp = lAP^bPnb + Mp (13) 

where Mp is the source term arising from the terms containing u, v and By, By. We now 
seek a solution (u, v, p) that satisfies the set of discrete equations (6), (8) and (13). 

3. SINGLE GRID SOLUTION STRATEGY AND PERFORMANCE 

The system of coupled equations (6), (8) and (13) has been previously solved by a sequential 
solution method, SIMPLER [19]. The iterative update involves solving in a cycle the pressure 
equation, followed by the two momentum equations. Starting from guessed velocity and pressure 
fields, the coefficients Ap and Aj^^ are first assembled. Using these, the pressure equation is 

assembled through the above mentioned formulae. The pressure equation is then solved by any 
convenient linear solver. For simplicity, we have used a point Gauss-Seidel scheme, which is 
repeated a few (nswpp) times. This pressure field is then used to solve the velocity equations. The 
previously assembled Ap and Aj^^ used, and a few (nswpm) sweeps of the Gauss-Seidel 

scheme are made. The new velocity field is then used for calculating the next iterate of the pressure 
field. 


A point to mention is the under-relaxation used to hold the iterative process from becoming 
unstable. This is done by adding only a part of the change to the flow variables in an implicit 
manner by modifying the central coefficients and the source terms in the discrete equations. Figure 
2 shows the behavior of the single grid scheme for flow in a driven square cavity, discretized on a 
triangular grid with increasing number of elements. As is evident, the convergence deteriorates 
with increasing number of nodes, which significantly increases the cost of performing systematic 
mesh refinement studies. 

4. DETAILS OF THE PRESENT MULTIGRID IMPLEMENTATION 
Mesh generation and refinement 

In the present procedure, the coarsest mesh is first generated as for any single grid 
procedure, by the Delaunay triangulation method. Subsequent finer grids are then generated by 
successively dividing each element into four elements (Figure 3(a)). A prespecified number of 
nested grids are thereby obtained. Each coarse grid element shares three nodes with the daughter 
finer grid elements. This grid arrangement makes the intergrid transfers as well as the construction 
of coarse grid equations simpler than with the practice of using different meshes for each grid 
density [4,5,7]. However, it has the disadvantage that the coarsest grid may not be very smooth. 
Nevertheless, the boundary shape is still accurately captured because during refinements, the 
daughter nodes are moved to coincide with the boundary shape. 

The coarse grid discrete equations 
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Successful multigrid procedures rely heavily on consistent practices for the construction of 
the coarse grid equations and for the restriction and prolongation operators. Consistent restriction 
of variables and residuals to the coarser grids is the most important aspect of multigrid procedures 
for a system of equations, especially the fluid flow equations. For nonlinear equations, the Full 
Approximation Scheme (FAS) is the most suitable scheme for deriving the coarse grid equations. 
This is an extension of the more straight-forward Correction Scheme (CS) that is used for linear 
equations. 

Consider the discrete fine grid equations given by 

Lfq‘'= pf (W) 


f f 

where L is the nonlinear operator matrix made of the convection and diffusion terms, q is the 

solution vector, and F is the right-hand side vector. The superscript f is used to denote the fine 
grid. After a few iterations on the fine grid, the residual is computed as 

R^=F^-L^q^ (15) 


This residual is restricted to the next coarser grid, and it is required that the corrections satisfy the 
equation 





W 


(16) 


f-1 f-1 

where L is the nonlinear operator on the coarse grid, A q is the vector of corrections on the 

f-1 

coarse grid, and If is the restriction operator. For the FAS scheme, equation (16) is rewritten as 


L (Aq -I- If q) = If R-i-L (If q) 


F^'^ -t l/'^ R^ - (F^'^ 


,f-l .yf-l f^x 
L (If q )) 


or 


.f-1 f-1 pf-1 ^ f-1 D f-1 ^ 

L q =F +(If R-Rq ) 


(17) 

(18) 


f-1 

where Rq is the residual on the coarse grid, calculated using the restricted solution vector and 
f-1 

q is the solution on the coarse grid. After a fixed number of iterations on the coarse grid, the 
corrections implied by the coarse grid solution can be extracted from the relation 


Aq 


f-1 



T f-1 J 

- If q 


(19) 


The above FAS scheme is used in a straight-forward way for the momentum equations. The 
restriction and prolongation operators defined below provide a consistent and convergent multigrid 
procedure. The main complexity in the present scheme lies in the construction of the pressure 
equation which satisfies mass continuity not for the nodal velocities but for a different set of fluxes 
implicitly located at the cell faces of the control volume. As the success of the present procedure 
relies solely on this aspect, we give below details of the coarse grid pressure equation. 
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The FAS form of the coarse grid pressure equation that results from the continuity 
satisfaction condition is derived as follows. We begin with the correction equation 

(V • = l/'^ r/ (20) 

where the prime denotes the correction in u, and the right-hand side is the restricted residual in the 
continuity equation. Equation (20) is expressed as 

V • (ii + = l/"^ r/ + (V • (21) 

Now, 

5 = a -h D Vp and v = v -i- D Vp (22) 

where u is the momentum velocity and Vp is the pressure gradient that is used to evaluate the cell 
face fluxes. For the coarse grid equations, the components of a are defined as 

a = (Ru + lAnb Unb) / Ap + (1 - a) u 
and 

V = (Rv+ 5:AnbVnb)/Ap + (l-a)v (23) 

where Ry and Ry are the net coarse grid momenrnm residuals defined from equation (21) as 

R = l/'^ R^ - Rq^'^ (24) 

Substituting equations (22) in (21), the coarse grid continuity equation is given hy 

V"(a 4-DVp + a'+D Vp')^ ^ ^ Rj.^ - 1 - V * (a + d vp)^ ^ (25) 

f 1 f-i f 

where p is the restricted pressure If p . Equation (25) can he further rewritten as 

V • (D Vp + D Rc^ - ^ + (V • D Vp + V • a)^'^ 

= If Rj, - V • u 4- Rj,q (26) 

f-1 

where R^q is the coarse grid residual in the pressure equation calculated using the restricted 

values of the variables. It must be noted that because of the segregated method of solution, a' is set 
to zero for the pressure equation. Now, in the FAS practice, the left-hand side terms of equation 
(26) can be combined to give 

V • (D Vp)^"^ = - V • a^'^ + r/’^ (27) 

f-1 

where p is now redefined to be 

f-1 T f-1 J ^ ( /^f-l D f-1 T f-1 p f ^ p f-1 

p =If p+(p) and R(, = If Rq RcO 
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f-1 

Equation (27) has the standard structure of the pressure equation with an added residual 
Restriction and prolongation operations 

Restriction and prolongation operators for structured rectangular and curvilinear grids are 
now well established. For arbitrarily generated sequence of unstructured grids the intergrid 
transfers must be performed through systematic interpolations using appropriate geometric 
coordinates of the variable locations [2], An advantage of constructing fine grids embedded within 
the coarse grids is that the simple injection scheme can be used as the restriction operator for the 
nodal variables. Thus coarse grid values for (u, v, p) are obtained by locating the fine grid 
daughter nodes coincident with the considered coarse grid nodes. 

For the residuals in the momentum equations, several fine grid residuals are summed to 
obtain the corresponding coarse grid residual Iff" 1 Rf. We need to determine the fractions of the 
fine grid control volumes around a coarse grid node that contribute to the coarse grid control 
volume (see Figure 3(b)). The coarse grid control volume around P in two dimensions is given by 
the area ABCDEFGHIJKL. This is composed of fractions of the fine grid control volumes around 
each of the nodes P, A, B ... and L. It is apparent that the complete fine grid control volume 
around P contributes to the coarse grid volume. It can be shown that the rest of the coarse grid 
volume is made of the sum of half the fine grid volumes around each of the nodes A, B, ...and K. 
Therefore, the restricted residual at point P is the sum of the fine grid residual at point P, and half 
the fine grid residuals at the surrounding fine grid nodes. 

The prolongation process similarly is considerably simplified because of the mesh 
embedding. Coarse grid corrections to the solution are prolongated by direct injection at those fine 
grid nodes that coincide with the coarse nodes. For those fine grid nodes that lie in between the 
coarse nodes, the corrections are determined as averages of the corrections at the two surrounding 
coarse nodes. For example, in Figure 3(a), the coarse grid corrections at nodes P, A, and B are 
injected onto the next finer grid, whereas the corrections at a node such as D are determined as 
averages of the corrections at P and A. 

5. TEST CALCULATIONS 

We shall now present the performance of the algorithm in three model flow problems that 
illustrate the potential of the technique in calculating complex internal flows. The three selected 
problems reflect complex geometry, elliptic nature of the flow field and the presence of very fine 
scale variations in the flow that can only be resolved by a very fine mesh. In future, other problems 
that contain inflows and outflows, periodic boundary conditions and turbulence equations will be 
considered. The main point to be demonstrated here is that the method converges rapidly and that 
the rate of convergence is independent of the mesh density. In comparison with the single grid 
convergence shown in Figure 2, the multigrid method should save a large number of iterations. 
This is indeed the case as will be presented below. 

Laminar Flow in a Square Cavity 

We have conducted a systematic testing of the influence of the flow Reynolds number, the 
under-relaxation factors and the mesh density for three model driven cavity problems. The first one 
is the familiar problem of flow in a driven square cavity. In our tests, the square cavity is 
discretized by triangular elements. The triangulation is performed by the Delaunay procedure. 
Several levels of grid are then superimposed over the coarsest grid. Since upwinding was not used 
in the present study, for each mesh level, there was a limiting value of the Reynolds number 
beyond which convergence was not possible. Therefore, in the multigrid sequence, the desired 
Reynolds number was used only on the finest mesh. Iterations on each of the coarser meshes were 
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performed with its stable maximum value of the Reynolds number, following along the concept of 
double discretisation. Two fixed V- cycles were examined. In the first, the number of iterations on 
the coarse grids increased as the coarsest grid was approached. On the locally finest grid, one 
iteration was performed. The next grid used two relaxations and the subsequent one three and so 
on. The same number of relaxations were performed on the up-leg of the V-cycle, except at the top 
of the V-cycle. In the second fixed cycle, a fixed number of three coarse grid relaxations were 
performed accompanied by one. relaxation on the finest grid. Both schemes were well convergent 
except for minor differences in the rates of convergence and the CPU times. 

Figure 4 shows the convergence history for a Reynolds number of 50 for different mesh 
densities, with the mass residual plotted against the number of iterations on the finest grid. In all 
the runs, the coarsest grid had 40 elements and 29 nodes. The finest grid in the 5-grid run had 
10240 elements and 5249 nodes. It is apparent from the plots that the rate of convergence in all 
cases is nearly independent of the grid size. There is a five order decrease in the mass residual in 
less than 20 multigrid cycles. This may be compared with the convergence shown (for 640 
elements) if only a single grid is used. Figure 5 shows the multigrid convergence for the highest 
permitted Reynolds number of 500 which requires a slightly larger number of iterations due to the 
increased'nonlinearity. The calculated results agreed well with previously reported results of Ghia 
et al. [20] and Vanka [13]. 

Laminar flow in a triangular cavity 

The flow in a triangular cavity wherein the fluid motion is set by the motion of the top wall is 
an interesting complex flow which results in an infinite number of vortices of diminishing intensity 
towards the lower corner of the cavity [21, 22]. Although the square cavity has been studied 
extensively, there has been very little numerical work reported on the triangular cavity [23]. The 
triangular cavity cannot be easily discretized by a curvilinear mesh that is smooth and has high 
quality. However, it is ideally suited for triangulation. For the calculations presented here, the 
depth of the cavity is twice the width of the top wall. Here, as in the square cavity, the top wall is 
moved to the right with a velocity u = 1. A series of Reynolds numbers up to 800 were considered 
and the performance of the method was evaluated. Here the Reynolds number is defined with 
respect to the depth of the cavity and the top wall velocity. 

Figures 6 and 7 show the multigrid convergence of the code for Reynolds numbers of 50 
and 800. Linear convergence is observed even with 12288 elements and 6305 nodes. The velocity 
vectors and streamtraces in the flow field are shown in Figures 8 and 9 for Reynolds numbers of 
50 and 800. The occurrence of the series of vortices is replicated by the calculations to the point 
of grid resolution. Further resolution near the bottom comer should reveal more and more eddies 
of smaller dimension. Moffat [21] has shown that for Stokes flow, the distance of each eddy 
from the corner increases in geometric progression as does its intensity. This was indeed seen for 
all the eddies except for the one near the top wall. Therefore, starting from the second eddy, the 
ratios of successive distances from the corner for Re = 50 are respectively, 1.97, 1.98 and 1.9. 
The deviation from the expected series for the topmost eddy is probably because of the breakdown 
of the Stokes flow assumption there. Near the top wall, inertial effects dominate, and Moffat's 
analysis is not valid there. 

Laminar flow in a semicircular cavity 

The final problem considered is the flow in a semi-circular cavity which has a curved 
boundary. In this case, the coarsest triangulation does not capture the tme shape of the boundary. 
However, as the mesh is refined, the fine grid points are moved to the boundary to fit the shape. 
Thus a better representation of the boundary is obtained. For this geometry also, several Reynolds 
numbers and mesh densities were considered. As a representative plot. Figure 10 shows the 
convergence for the Reynolds number of 500 discretized with 3584 elements and 1873 nodes. The 
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consistency of the coarse grid and fine grid transfers is demonstrated by this rate of convergence. It 
is to be noted that only the near boundary elements are altered and no remeshing is performed. This 
preserves the restriction/prolongation practices that are valid in the interior. The velocity vectors 
and streamtraces in the flow field for Re = 500 are shown in Figure 11. 

Table 1 summarizes all the calculations currently performed with this procedure. The 
corresponding work units are,also presented, which accounts for the coarse grid iterations. The 
work involved in the injections and interpolations during restriction and prolongation is neglected 
as per the standard practice in multigrid literature. 

6. CONCLUSIONS 

In this paper, a multigrid method for unstructured grids based on geometric coarsening 
(versus algebraic coarsening, Webster [14]) has been presented. A sequence of embedded grids 
has been used to smooth out low frequency errors, and accelerate the convergence on fine grids. 
The momentum and continuity equations are discretized by a control volume procedure with equal 
order interpolations for the variables. The mass continuity equation is transformed to a pressure 
equation which is derived through special interpolations that provide a well-connected pressure 
field. A simple iterative scheme such as the Gauss-Seidel method has been used to relax the 
discrete equations on any grid. The coarse grid pressure equation is constructed by a consistent 
restriction of the ceil face fluxes and appropriate equations. It is demonstrated that the method 
provides good multigrid convergence in the three test problems for all Reynolds numbers up to 
the value permitted by the cell Reynolds number criterion of the central differencing scheme. 
Future extensions to this procedure are underway to include periodic boundary conditions, 
turbulence models, time-dependent terms, and three-dimensional variations. 
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Figure 1: (a) Unstructured mesh with control volume around node P; (b) Element PAB and local 
coordinate system 



Number of Iterations 


Figure 2: Single grid convergence for shear driven flow in a square cavity with increase in number of 
elements 














Figure 4: Muitigrid and single grid convergence for laminar flow in a square cavity at Re = 50 
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Figure 5: Mulligrid and single grid convergence for laminar flow in a square cavity at Re = 500 



Figure 6: Multigrid and single grid convergence for laminar flow in a triangular cavity at Re = 50 
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Figure 10: Multigrid and single grid convergence for laminar flow in a semicircular cavity at Re = 500, 
with 3584 elements 
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Figure 11; Velocity vectors and streamtraces for laminar flow in a semicircular cavity at Re = 500 


Table 1: Number of fine grid iterations for a five order decrease in the residuals, shown as a function 
of the number of elements and the Reynolds number. Each fine grid iteration corresponds to three 
work miits 
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Abstract 

A multigrid-mask method for solution of incompressible Navier-Stokes equations in primitive variable 
form has been developed. The main objective is to apply this method in conjunction with the pseudospec- 
tral element method solving flow past multiple objects. There are two key steps involved in calculating 
flow past multiple objects. The first step utilizes only Cartesian grid points. This homogeneous or mask 
method step permits flow into the interior rectangular elements contained in objects, but with the re¬ 
striction that the velocity for those Cartesian elements within and on the surface of an object should be 
small or zero. This step easily produces an approximate flow field on Cartesian grid points covering the 
entire flow field. The second or heterogeneous step corrects the approximate flow field to account for 
the actual shape of the objects by solving the flow field based on the local coordinates surrounding each 
object and adapted to it. The noise occurring in data communication between the global (low frequency) 
coordinates and the local (high frequency) coordinates is eliminated by the multigrid method when the 
Schwarz Alternating Procedure (SAP) is implemented. 

Two dimensional flow past circular and elliptic cylinders will be presented to demonstrate the versa¬ 
tility of the proposed method. An interesting phenomenon is found that when the second elliptic cylinder 
is placed in the wake of the first elliptic cylinder a traction force results in a negative drag coefficient. 

1 Introduction 

The motive to develop the multigrid-mask method is to remedy the drawback of grid generation which 
often results in a tremendous effort to achieve the desired layout of grid points for flow past multiple 
objects. As expected, the grid generation becomes even more difficult when the objects are close to 
each other or randomly moving. The situation occurs in many physical problems, such as cross flow in 
shell-tube heat exchangers, two phase flow in multiple particle sedimentation, and flow of blood cells in 
arteriols, capillaries, and venules (Stokes flow). 
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The conventional numerical simulation of Navier-Stokes (or Stokes) flow with multiobject systems falls 
into two main categories: (I) distinguishable and (H) indistinguishable fluid-object interfaces. Category I 
defines a distinct boundary between objects and fluid, and exact boundary conditions; velocity and force 
can be prescribed on the surface of objects. Actually, this category partitions the entire flow domain into 
two heterogeneous systems: objects (may or may not have fluid inside) and fluid system. It is capable of 
providing highly accurate details of flow interaction among objects but is computationally intensive (not 
more than three objects). Ingber [1] and Tran-Cong & Phan-Thien [2] use the boundary element method 
for suspensions of rigid particles in Stokes flow and Li, Zhou, & Pozrikidis [3] use the boundary element 
method for deformable particles. 

Category II implies that a fuzzy boundary exists between objects and fluid. In other words, there is 
no distinct boundary between objects and fluid; therefore, a homogeneous system can be applied to the 
entire domain. As a result, a single set of fluid dynamics equations holds at all grid points (a “stationary” 
grid) of the domain and no internal boundaries are necessarily defined, i.e., original boundary conditions, 
force on the fluid-object surfaces, now become the additional inhomogeneous source term in the Navier- 
Stokes equations. However, a sharp discontinuity for the velocity field (or other variables) between the 
fluid-object interfaces should be preserved in conformity with the ori^nal problem. In order to maintain 
a sharp front between fluid-object interfaces, the fuzzy boundary should be restricted to within a few 
mesh distances; the less the mesh distance, the better the resolution of fluid-object interfaces. A variety 
of means to achieve the desired sharp fluid-object interface are suggested by many investigators [4, 5, 
6]. Basically, the flow field is discretized by the finite difference approximation on a stationary grid 
to cover the entire flow domain. For the moving or deformable objects, a separate object grid which 
configures the geometry of objects needs to be defined, and this object grid is allowed to move with the 
speed interpolated from the stationary grid. The discussion of moving or deformable cases is beyond the 
current scope. 

Briscolini and Santangelo [5] proposed the spectral method to solve the incompressible unsteady flow 
over a circular cylinder by introducing a strip zone (or equivalent to stationary boundary layers in which a 
steep change of field variables occurs) of control within a few meshes. A narrow mask (Gaussian) function, 
defined as zero inside the objects and one elsewhere along with a smooth connection between these two 
values within the strip zone, is applied to the velocity field. The drawback of the mask method is that it 
only provides an approximate flow field due to an inexact capturing of the configuration of the objects by a 
stationary grid alone as well as the thickness of the fuzzy boundary (a few meshes wide) between the fluid 
and cylinder. Peskin [6] adopted the immersed boundary method for numerical simulation of blood flow 
in the human heart. His idea is very similar to the mask method of Briscolini and Santangelo [5] except a 
separate material grid is added to trace the heart wall movement. For the data communication between 
the stationary grid and the material grid, Peskin [6] employs an approximation to the delta-function to 
define the interpolated velocity and force transferred between the fluid-object system. 

The objective of this paper is to develop a numerical method which combines the desired features of 
both category I and II and that can also accurately simulate the flow interaction among multiple objects. 
In practice, it includes two major steps: (1) apply a stationary grid to obtain a fast solution covering the 
entire domain, which is similar to the category II approach but differs in some respects by requiring that 
the velocity for the stationary grid falling inside objects is imposed to be small or zero (a homogeneous 
step or mask method is hereafter named); and (2) generate a local fluid grid surrounding objects to 
exactly capture the surface configuration of objects, which is similar to category I by prescribing exact 
boundary conditions on the surface of objects (a heterogeneous step). Notice that step (1) only provides 
an approximate flow field and step (2) corrects the approximate flow field predicted from step (1) with 
the imposition of exact boundary conditions on the surface of objects. 

In domain overlapping terminology, one can regard the local fluid (or fine) grid as being fully over¬ 
lapped with the global stationary (or coarse) grid. A data communication process between the stationary 
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and fluid grid can be conducted by the Schwarz Alternating Procedure (SAP) [7]. Although the grid 
points of each grid system in the overlapping area are not coincided with each other, the SAP iterative 
scheme still can be used effectively for data communication between the stationary and fluid grid in con¬ 
junction with the multigrid method [8, 9]. The role of the multigrid method in the SAP process ensures 
a smooth data interpolation between the global stationary and local fluid grid without introducing any 
high-frequency error. 

The solution of the Navier-Stpkes equations is implemented by the pseudospectral element method, 
which is an extension of the global pseudospectral method to the element-type method by requiring that 
the function continuity c° be continuous across the interface between two adjacent elements [10] when 
calculating the derivatives of a function. 


2 Primitive Variable Formulation 


2.1 Navier-Stokes Equations 


In tensor notation, the time-dependent Navier-Stokes equations in dimensionless form can be described 
as 


dui dui 

dt dxi 


dui 

dxi 


dp 1 d^ui 

(la) 

dxi ^ Re dx'j 

= 0. 

(lb) 


Here Ui is the velocity component and Re is the Reynolds number. 

The method applied to solve the Navier-Stokes equations is Chorin’s [11] splitting technique. Accord¬ 
ing to this technique, the equations of motion are written in the form 


duj dp 
dt dx^ ‘ 


( 2 ) 


where Fi = —Uj duijdxj+l/Red'^Ui/dxj. 

The first step is to split the velocity into a sum of predicted and corrected values. The predicted 
velocity is determined by time integration of the momentum equations without the pressure term 




< + AtF,^. 


(3) 


The second step is to determine the pressure and corrected velocity fields that satisfy the continuity 
equation by using the relationships 


dp 


- At- 
‘ * dxi 


du^+^ 

dxi 


= 0 . 


(4a) 

(4b) 


Here the superscript n denotes the n-th time step. 

An equation for the pressure can be obtained by taking the divergence of Eq. (4a). In view of Eq. 
(4b), we obtain 

d^p _ _l_d^ . s 

dxf At dxi' 

Note that the pressure solution on the global stationary grid is solved numerically by separation of 
variables [7], while the Generalized Conjugate Residual (GCR) method [12] is used to iteratively solve 
the pressure equation on the local fluid grid. 
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3 Domain Decomposition with Multigrid-Mask Method 

As mentioned in the section of Introduction, two major steps are involved for the calculation of flow past 
multiple objects: a homogeneous step as well as a heterogeneous step. Data communication between the 
stationary and fluids grid by the multigrid method will be described in the process of the heterogeneous 
step. Each step is addressed as follows. 


3.1 Mask Method - -Homogeneous Step 


A single coordinate system is used to produce a stationary grid to cover the entire fluid-object domain. 
Usually, several stationary grid points are contained inside the objects. This homogeneous step is some¬ 
times called the mask method, which is analogous to that proposed by Briscolini & Santangelo [5] and 
Peskin [6]. In other words, it permits flow into the interior stationary grid points contained in the objects 
and considers the objects as a homogeneous (whole) system; no distinction between the fluid and objects 
is made. But the requirement that the velocity on the stationary grid points confined in the objects being 
small or zero should be met. 

According to this step, the Cartesian grid points can be extended to cover the interior of each object 
and the entire domain. Such an approach enables us to take advantage of the fast solution for the operator 
resulting from the desired feature of a complete Laplacian type. 

As pointed out in the Introduction, the mask method only provides an approximate flow field because 
the Cartesian grids contained in the objects cannot accurately represent the configuration of objects 
themselves. Besides, the flow field on the Cartesian grid points inside or on the surface of the objects 
should be prescribed in order to comply with the original problem, i.e., no flow or small velocity inside 
the objects (including on the surface). 

Such a criterion, equivalent to finding a predicted velocity inside the objects as appeared in Eq. 
(4a), can be met by setting 


= u^ + At 


dp 

dxi 


( 6 ) 


on the Cartesian grid points confined in the interior of objects. Here superscript p refers to the prescribed 
velocity. Presumably, this should implicitly force to be equal to the prescribed value. However, due 
to the nonsmooth flow field exhibited around the fluid-object interfaces, simply choosing the predicted 
velocity to be zero or constant does not guarantee that the velocity obtained from Eq. (4a) 

be inside the objects after solving Eq. (5). Thus, the predicted velocity inside the objects can 

be obtained by the repeated solution of Eqs. (5) and (6). Usually, only 1 to 2 iterations are required to 
ensure that || — vP ||< 10“^ after a few hundred time steps. 


3.2 Multigrid Method - Heterogeneous Step 

In order to correct the approximate flow field predicted from the homogeneous step (based on the station¬ 
ary grid), the heterogeneous step next accounts for the actual shape of the objects by adding their own 
local coordinates; an external fluid grid surrounds each object. Since the mask method does not define 
a distinct interface between fluid and objects, rather the fuzzy interface falls within a few meshes. As a 
result, such fluid-object interfaces need to be defined, and this is what the heterogeneous step tends to 
accomplish. The boundary conditions on the surface of objects are straightforward with no slip velocity. 

In view of the domain decomposition approach for flow past multiple objects, one can regard the 
local subdomains (fine grid referred to the fluid grid surrounding each object) fully overlapped with the 
global (coarse grid referred to the stationary grid) rectangular domain as depicted in Fig. 1. As for the 
data communication between the fluid and stationary grid, the iterative SAP technique will be naturally 
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suitable for this purpose, i.e., the global stationary grid provides the outer boundary infornaation for the 
local fluid grid and in turn the local fluid grid corrects the flow field outside objects by imposing exact 
boundary conditions on the fluid-object interfaces. 

Due to the different orientation and resolution of each grid system, simply exchanging the data through 
interpolation in the overlapping area, stationary (coarse)-fluid (fine) grid system, causes the high frequency 
error induced by the fine-grid (fluid grid) subdomain and hence affects the results throughout the whole 
computational domain. The technique of filtering the high-frequency noise is also known cis the multi¬ 
grid method. The coarse-grid correction process often used in the multigrid method is adopted in the 
overlapping area for the coupled pressure and velocity field and has been proposed by Ku & Ramaswamy 
[9]: 

Ve • Ue - Vc • (//u/) = //(ry - Vy • Uy). (7) 

Here Vc- represents the operator of divergence on the coarse-grid subdomain, // is an interpolation 
operator from the fine-grid subdomain “/” to the coarse-grid domain “c,” u is the velocity, and ry is the 
divergence of the velocity field which should be set to zero at the first SAP iteration. The left hand side 
of Eq. (7) is the difference between the coarse-grid operator acting on the coarse-grid domain and the 
coarse-grid operator acting on the interpolated fine-grid subdomain (which is held fixed). When the term 
Vc • Uc appearing in Eq. (7) is substituted by Eq. (4a) the pressure equation in the coarse-grid domain 
is thus governed, and so is the pressure equation in the fine-grid domain. Actually, Eq. (7) implicitly 
functions as a coupled equation between the pressure and velocity; not only the residual of the right hand 
side of Eq. (7) should be equal to zero but also the unchanged velocity field during the SAP iteration is 
required. 

In the overlapping area ry cannot be predetermined and needs to be adjusted until the velocity field 
generated from the coupled pressure equations Vc • Uc = Vc • (//uy) and Vy • uy = ry is unchanged. 

Once the residual ry - V • uy and velocity field do not change in the fine-grid subdomain, this implies 
that 

«c = //uy. (8) 

Whenever either the residual ry - V-uy or the velocity field in the fine-grid subdomain still varies, Eq. (7) 
acts as a coarse-grid correction process to transfer the correction of the velocity field back to the fine-grid 
subdomain, i.e., 

^new ^ ^old ^ _ jf (g) 

This is vital for the success of the scheme. Changes in the velocity field are transferred back to the 
fine-grid subdomain rather than the velocity field itself. At each SAP iteration, ry can be simply chosen 
as ry = Vy • from Eq. (7). 

The multigrid-mask SAP iterative solution of the incompressible Navier-Stokes equations in primitive 
variable form for flow past multiple objects (also shown in Fig. 1) is summarized by the following algorithm: 

1. First assume on the outer boundary of each object. Usually u" will be a good initial guess. 

2. Solve the fine-grid or fluid grid system, where the pressure solution is obtained by the preconditioned 
General Conjugate Residual (GCR) method. 

3. With the interpolated solution of from step (2) through Eq. (8) in the overlapping area, solve 
the pressure on the coarse-grid domain (stationary grid) by the mask method with the eigenfunction 
expansion technique and also update in the overlapping area of the fine-grid domain by the 
coarse-grid correction process in Eq. (9). 

4. Repeat steps (2) & (3) until the velocity in the overlapping area satisfies the convergence 
criterion of Eq. (8). 
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It is worthwhile to emphasize that even with strong discontinuity exhibited for the velocity on the 
grid points immediately outside the objects the multigrid-mask method indeed meets the requirements 
of both having small velocity inside the objects and satisfying of Eq. (8). 

4 Results and Discussion 

Four SAP iterations are employed for all the test problems, and the convergence criterion of Eq. (8) is 
satisfied by the requirement || Uc — //uy 1|< 2.5 x 10“^. The radiation boundary condition [8] is applied 
on the truncated downstream to give the least influence upon the upstream flow development. 

4.1 Circular cylinders 

For the first benchmark test, we choose a uniform flow over a cylinder to give a comparison of results 
between the multigrid scheme and the pseudospectral element method [9], in which the computational 
domain is decomposed into two subdomains; an “0” grid domain, partially overlapped with the Cartesian 
grid domain. The diameter of a cylinder over the width of a channel is 1/20 in this numerical experiment; 
18 X 15 elements (each element contains 7x7 points in the x and y directions) are allocated in the 
stationary grid system, and 15 x 6 elements in the fluid (or “0”) grid system. The periodic character of the 
flow motion can be defined by the Strouhal number S = fD/Umax^ where / is the shedding frequency, D is 
the diameter of a cylinder, and U is the maximum inlet velocity. Numerical results of drag coefficient Cd 
and lift coefficient predicted by the multigrid-mask method, 1.379 < Cd < 1.394, —0.263 < Ci < 0.263 
for Re = 100 and 1.328 < Cd < 1.481,-0.733 < Cl < 0.733 for Re = 250, are in good agreement with 
those calculated by the multigrid method of [9]: 1.36 < Cd < 1.385,-0.269 < Cl < 0.269 for Re = 100 
and 1.29 < Cd < 1.432,-0.711 <Cl< 0.711 for Re = 250. The Strouhal number, S = 0.168 at Re = 
100 and S = 0.208 at Re = 250, also reproduces the same results as those found in [9]. Streamline plots 
presented in Fig. 2 describe the typical flow motion behind the cylinder at Re = 100 and 250, respectively. 

We secondly examine Poiseuille flow past multiple cylinders at Re = 20 using the multigrid-mask 
method. Figs. 3 and 4 show both the element layouts of the stationary and fluid grids and streamline 
plots for flow over four cylinders with the shortest distance 1.828 (Fig. 3a) and 0.414 (Fig. 4a) diameter 
of the cylinder. Numerical results indicate that less flow rate goes through the intercylinder area when 
the case in Fig. 4b is compared with the case in Fig. 3b. Due to the relatively large flow rate going 
through the outer cylinders as shown in Fig. 4b a strong separation behind the fourth (or last) cylinder 
is observed. 

4.2 Elliptic cylinders 

In this case, an incoming uniform flow past a slender elliptic cylinder of thickness ratio (minor to major 
axis) 1:6.66 at a 45° incidence angle is studied. Reynolds number is chosen to be Re = 200 (based on the 
chord length which is twice that of major axis), and’ the aspect ratio (the channel width over the chord 
length) is 20. The number of elements allocated for the stationary grid system is 14 x 16 elements in 
the X and y directions, and 14 x 4 elements are adopted for the fluid grid system. The detailed element 
layout is sketched for the first elliptic cylinder shown in Fig. 1. 

When the incidence angle is 45° and Reynolds number is Re = 200, a well-known Kdrmdn vortex 
street develops [13]. The streamline plots shown in Fig. 5 illustrate the history of separation behind the 
elliptic cylinder within a cycle. If one regards the separation starting from the leading edge as seen in Fig. 
5a, the time evolution of separation is described as follows: the separation region continues to increase 
toward the trailing edge (Fig. 5b) and up to the trailing edge where the maximal lift holds. After the 
separation breaks down (Fig. 5c), it restarts from the trailing edge (Fig. 5d) and then gradually extends 
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to the region toward the top tip (Fig. 5e), where the minimal lift occurs. The separation also splits into 
two parts: one is located immediately behind the ellipse, and another forms as a vortex behind the body 
(Fig. 5f). 

The drag and lift coefficients are found to be —0.985 < Cl < —1.500,1.355 < Cd < 1.781 (as seen 
in Fig. 6), which are qualitatively similar to the case with thickness ratio of 1:10 in [13]. The Strouhal 
number is 0.275 in contrast with 0.25 in the case of a thickness ratio of 1:10. 

To demonstrate the capability ,of the multigrid-mask method in simulating the interaction among 
multiple objects, we add another elliptic cylinder with thickness ratio 1:4 (chord length is 60% of the first 
one) in the direction of incoming flow. The element layout is also sketched in Fig. 1 and the position is 
placed in the wake of the first elliptic cylinder. It is very common for us to experience the traction force 
when we park a car and another high speed car passes by to us, or when a small plane flys into the wake 
of a big plane, a tremendous suction force can cause a small plane to crash into the big one. 

In order to prove that the traction force acting on the second elliptic cylinder is induced by the 
wake effect from the front one, it is rational to plot the time history of the drag coefficient at the rear 
one. If any negative value of drag coefficient exists, it supports our assumption. In Fig. 7, the drag 
and lift coefficients of both elliptic cylinders appear in the same plot. Evidently, the negative drag 
coefficient for the second one indeed stands and strengthens the fact that the traction force acts on the 
rear elliptic cylinder. Meanwhile, the drag and lift coefficients for the front elliptic cylinder also change 
(1.30 < Cd < 1.828,-0.82 < Cl < —1.39) due to the existence of the rear one. More strikingly, the 
Stouhal number is reduced to 0.208, which is the same as that of the rear elliptic cylinder (resonant 
effect), whose drag and lift coefficients are —0.139 < Cd < 0.360,-0.939 < Cl < 0.911, respectively. 

The streamline plots as seen in Fig. 8 give a detailed description of the aforementioned traction effect. 
The phenomenon of the front elliptic cylinder is very similar to that of the single case; separation starts 
from the leading edge and grows up to the trailing edge where the separation breaks down, then restarts 
from the trailing edge and extends toward the leading edge where it splits into two parts, one on the 
surface with a small intensity and another in the wake region. The traction force can be judged based 
on the vortex formation on the surface of the second elliptic cylinder. Whenever the vortex formation 
appears on the front surface of the second one, the drag coefficient turns into a negative value as indicated 
in Fig. 8c. The negative value persists during the time period (Fig. 8c - Fig. 8e) when the separation 
on the surface of the front elliptic cylinder breaks down at the tail and restarts from the bottom and 
extends toward the tip. The intensity of the traction force turns out to be the strongest when the wake 
zone resulting from the first elliptic cylinder acts on the front surface of the second one and becomes the 
largest (Fig. 8d). 

5 Conclusions 

The solution of the Navier-Stokes equations in primitive variable form has been obtained by the pseu- 
dospectral element method via the multigrid-mask SAP domain decomposition technique. The solution 
procedure for flow past multiple (or single) objects includes two basic steps: a homogeneous step (mask 
method) and a heterogeneous step of (multigrid method). The solution on the stationary grid is first 
solved by the mask method, then the iterative solution between the heterogeneous step, the solution on 
the fluid grid, and the homogeneous (mask) step is repeated by the SAP technique with multigrid method. 

The homogeneous step permits flow into the stationary grid contained in each object but subject to 
the restriction that flow inside or on the surface of objects should be small within the prescribed error 
index. The merit of the mask method is its simplicity to first provide an approximate solution of flow 
field by the fast eigenfunction solver. The implementation of heterogeneous step is next used to correct 
the flow field predicted from the homogeneous step by considering the actual contour and exact boundary 
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conditions on the surface of objects. 

From the solution point of view, the problem can be interpreted as the local fluid grid representing 
the objects fully overlapped with the global stationary grid standing for the entire computational domain. 
The SAP iterative technique bridges the data com m unication between the local and global coordinate 
systems. During the data exchange between the fluid grid (fine-grid) domain and the stationary grid 
(coarse-grid) domain, the coarse-grid correction technique is used to eliminate the high frequency error 
caused by the data interpolation from the fine-grid domain to the coarse-grid domain. 

Test problems demonstrate the versatility of the proposed multigrid-mask method. Future research 
will concentrate on solution of flow in the three-dimensional geometries. 
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Fig. 2 Streamline plots for flow past a cylinder for (a) Re = 100, 
and (b) Re = 250 
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Fig. 4 Flow past four cylinders at Re = 20 with (a) element layout, 
and (b) streamline plot 
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Fig. 8 Time history of streamline plots for Re = 200 at time (a) t = 0, (b) t = 0.27T, (c) t = 0.46T, 
(d) t = 0.67T, (e) t = 0.77T, (f) t = 0.91T 
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SUMMARY 


The goal of this paper is the implementation of hybrid U-cycle hierarchical multilevel methods for 
the indefinite discrete systems which arise when a mixed finite element approximation is used to 
solve elliptic boundary value problems. By introducing a penalty parameter, the perturbed 
indefinite system can be reduced to a symmetric positive definite system containing the small 
penalty parameter for the velocity unknown alone. We stabilize the hierarchical spatial 
decomposition approach proposed by Cai, Goldstein, and Pasciak for the reduced system. We 
demonstrate that the relative condition number of the preconditioner is bounded uniformly with 
respect to the penalty parameter, the number of levels and possible jumps of the coefficients as long 
as they occur only across the edges of the coarsest elements. 

INTRODUCTION 


We shall be concerned with solving the discrete equations which arise when the mixed 
approximatipn is used for second order elliptic boundary value problems. Specifically, we consider 
the mixed approximation based on the Raviart-Thomas spaces [12]. Such approximations lead to 
the solution of linear systems involving block matrices of the form 

( M \ 

\n 0 }■ 

Here M is symmetric and positive definite and is the transpose of the matrix N. This matrix is 
clearly symmetric and indefinite. 

Instead of solving this system directly, we consider solving the penalty approximation to it (cf. 
[1],[5]). This approximation involves the use of a small parameter e (10“^ ~ 10“® in practice) and 
results in a linear system involving the block form 

( M \ 

[ N -el )- 
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The linear system of this form can be reduced to the solution of the matrix 

M + (1) 

Although the matrix in (1) is symmetric and positive definite, it can have a large condition number 
of the order Here, h is the discretization parameter. 

The hierarchical space decomposition method proposed in [8] reduces the above condition 
number to the order 0{h~^ log ^). That is, the dependence of the penalty parameter e has been 
removed and a reduction in the mesh dependence has been achieved. In the same paper [8], a 
negative result for the standard application of the multigrid method to the reduced system has 
been suggested. The asymptotic behavior for the standard multigrid method remains of the order 

In this paper, we stabilize the hierarchical spatial decomposition approach from [8] by allowing 
hybrid H-cycle type multilevel iterations developed by Axelsson and Vassilevski (cf. [2], [3], [13], 
[14]). This means that we use a pure H-cycle iteration at most of the levels while we perform a 
i/-fold (ly > 1) cycle iteration at levels whose index is proportional to a fixed integer parameter ko. 
We demonstrate that the hybrid H-cycle hierarchical multilevel preconditioners constructed in this 
manner give relative condition numbers that are uniformly bounded with respect to both the 
penalty parameter £ and the number of discretization levels if ko is sufficiently large and v (the 
number of recursive calls at every ko level) satisfies certain inequalities determined only by ko- 

Finally, we note that there are other approaches suggested in Bramble, Pasciak, and Xu [6], 
Ewing, Lazarov, and Vassilevski [9], Mathew [11] for indefinite systems that arise in mixed finite 
element discretizations of second-order elliptic problems. Some of these methods are based on 
reducing the indefinite systems by working in divergence-free finite element spaces to obtain a 
system with a symmetric and positive definite matrix. 


STATEMENT OF THE PROBLEM 

Let H be a two-dimensional polygon and consider the following boundary value problem: 

f -V • (kVp) = /, in fl, 

( p = 0, on F = dQ, ^ ^ 

where / € L^(Q) arid k = k{x) (re G 0 is bounded from above and below by some positive 
constants). 

We shall use the following space to describe the corresponding variational problems. We consider 
the Hilbert space 

H{diy,n) = {ve[L\n)f |V-vGL2(a)} 

with norm defined by 

ll’'^llH(div;n) = lKllL2(n) + 11^ • v||i2(n)- 
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In (2), we set u = —kVp. Then we obtain the following variational equations: 

f - (p, V • v) = 0, for all V e if(div;fi), 

\ (V-u, 9 ) = (/,?), for all ^ G T2(n). 

Here (•, •) denotes the inner product in or 

We assume that we are given two finite dimensional subspaces 

y^Clf(div;0) and d L\^) 

defined on a quasi-uniform mesh with elements of size 0{h). The mixed finite element 
approximation of (u,p) is then defined to be the pair, (u^,p'‘) G x satisfying 

f (k~^u^,v) — (p^,V • v) = 0, for all V G 

1 -(V•U^9) = ior^llqeW\ 


(3) 


Problem (3) can be reformulated in terms of operators. We define operators M -.V^ 


N :V^ and N* :W^ by 




1 

< 

for all 
for all q G 

{Nv,q) = 

-(V-v,,), 

{N*q,^r) = 


for all V G V^. 


With this notation, (3) takes the following form: 


M N* \ ( 

N 0 )\p^ 



(4) 


where denotes the orthogonal projection of / onto W^. 

The solution (u^,p^) can be approximated by regularization (i.e., by solving a reduced system 
using a penalty approximation). Let e > 0 be a small (penalty) parameter. We consider the 
solution of the following perturbed system: 


f M N* \ f f 0 \ 

(iv -eijyp^j-y-f^)- 

Eliminating p^ in (5) gives rise to the following reduced problem for : 


( 5 ) 

( 6 ) 


The operator is obviously symmetric and positive definite. We note that once has been 
determined from (6), can be computed by 

pj = e-' (jVuJ + /'*) . 
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The penalty method was analyzed in [1] and [5] for a class of mixed approximations. It follows 
from these results that, for the Raviart-Thomas spaces [12], 


U - I |H(div;n) + \\p- Pel U2(n) 


< c 


- vl|H(div;n) + lb - 9ib2(n) + elblU^co) 


9 


where the constant C is independent of both e and h. 

Moreover, we note that the problem (4) is indefinite and of saddle-point type. An adequate 
approximation can be provided if the finite element spaces and satisfy the Babuska-Brezzi 
stability condition (cf. Babuska [4] and Brezzi [7]). This means that for some positive constant ^ 
independent of the mesh parameter h the following stability condition holds: 


(V-v,g) 

sup p- 

vev'» Ilv||iy(div;f2) 


>m\\i 


for all q^W^. 


In the remainder of this section, we describe the Raviart-Thomas spaces on the triangle T. The 
Raviart-Thomas space of order r (a given nonnegative integer) on the triangle T for the velocity is 
defined by 

v"(r) = {ve[P,(T)]2©vo}, 

where 

/ XiPr{x) \ 

° \ X^Prix) ) 

and Pt{x) is a homogeneous polynomial of degree r. The corresponding space for the pressure is 
given by 

W^{T) = Pr{T), 

where Pr{T) is a polynomial of degree r defined on the triangle T. We also consider the projection 
operator tt/j that is defined by the following: 


(tt/iV • n, q)E = (v • n, q)Ei for q G Pr{E) and all three edges E of T, 
(7r/iV,V>)T = (v,^)t, for ^ G (P-i(T))^ 


HIERARCHICAL SPACE DECOMPOSITION METHOD 

In this section, we shall describe the hierarchical spatial decomposition method [8]. We start 
with a coarse initial triangulation Tq of the domain 0. For any element T G Tq, we consider the 
local ellipticity constants 

sup k{x) 

^ _ xBT _ 

^ inf k{x) 

xeT ^ ' 
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and 


(T = maxcrr. 
reTo 


Note that a can remain close to 1 even when the coefficient k{x) has large jumps, as long as these 
occur only across edges of elements from %. 

We next construct a nested family of triangulations 

iTj = 

of the domain ft by subdividing each element of Tj into four congruent ones to obtain Tj^i . We 
consider the Raviart-Thomas velocity space Vj for every triangulation 7} (with mesh size 
hj = 2“-'ho). For each level j = 1,2, • • •, J, we let 

where is the projection operator defined in (7). For convenience, we shall let 7r_i = 0. 

We define the spaces Mj to be the images of the operators {ttj — itj-i) acting on elements from 14 

= {w = (tTj - 7rj_i)v, all v G 14} • 

It is clear that {Mj} are subspaces of Vj = 14 satisfying 

Vj = Mo® Ml® ...® Mj, i = 

For j = 0,1,... J, we define the operator Aj to be 

{AjV,w) = A^{v,w), for all G Vj. 

We next define the operators Dj to be That is, 

{Dji>,0) = A^{il>,0), for all ij),6 G Mj, 

where tj} — {jKj — 7rj_i)v and 6 = (tTj — 7rj_i)w for some v, w G Vj. 

The primitive form of the hierarchical preconditioner proposed in [8] can be written as 

J 

{Bjv, w) = {Bottov, ttow) + ^ (l)^ij’^,0^), 

(7 = 1 

where Bq = Ao, tp^ = — Tr^-i)v, and = (7r<j — 7r<j_i)u;. To obtain an efficient preconditioner 

Dj for Dj (cf. [8]), we use the decomposition 

tj) = tl>jj+tl}p, 

where G Mj is defined element-wise for any T G Tj-i such that 
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J = 0, for all 6 € Mj, and 0 • n = 0 on dT; 

1 “ibfr ■ n = lb • n 

V ^" dT dT 

Let be an appropriately scaled diagonal part of such as 
(^fv,v)=C E 

all edges B {^s}^_n ^ 

of all T6Ti_i „/d-° „„ E 

for some constant C > 0 independent of hj-i and for some weights 

kE = —H- ^—r, where Ti Pi T '2 = -B and Tx,T 2 € 

max k max k 

Ti T2 

Then we can write Dj as 

(AV’fX) = + {Df^p,Xp) ■ 

The final form for the hierarchical preconditioner becomes 

j 

{BjV,w) = {Bo'P^qV.'KqW) + E {Dcri>a,0a) ■ 

a—l 

We now state the following theorem for the hierarchical preconditioner [8] without proof. 

Theorem 1. For any vector function v G Vj, we have that 

C2-^Bf{v,v) < A^(v,v) < CJBf{v,v), 

where C is a constant independent of e, J, and the mesh size. 

The above theorem shows that the hierarchical preconditioner can be used to effectively 
precondition the original form as long as J is not too large. 

HYBRID F-CYCLE MULTILEVEL PRECONDITIONERS 

We shall describe the hybrid V-cycle multilevel preconditioners in this section. The construction of 
these multilevel preconditioners is based on the hierarchical preconditioner (8) and some 
polynomial acceleration techniques proposedTn [2], [3], [13], and [14]. The purpose of the 
polynomial acceleration is to stabilize the growth of the condition number for the hierarchical 
preconditioner. The hybrid V-cycle multilevel preconditioner Bj is defined by recursion as follows. 

# Let Pu{t) be a given polynomial of degree n > I such that 

(i) Pv{^) = 1, 

(ii) 0 < Pu{t) <1 for t G (0,1]. 
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e For a given integer parameter ko > I, vfe set 

1 . Bo = Ao, 

2. {BkoV,w) = {Bq-kov.-Kow) + X] 

<7=1 

« For s = 1,2,... , m = sfco, aiid for all j such that m < j < m + ko, we first define operator 
Bm to be 

m 

{Bm.V, w) = {BoTCoV, TTqw) + ^ {pallia-, Oa) (Vu, W G Ki)- 

(T=l 

Then the operator Bj is obtained for all v^w EVj by the relation 

3 ^ 

{BjV, w) = pmT^mV, TTmw) + - TT^-lv), (tT^W - 7r^-l)wJ . 

a=m4-l 

We next present some technical lemmas which are used to prove the convergence results for the 
hybrid l/-cycle multilevel preconditioners. We will state these lemmas without proof. We refer to 
[10] for detailed proof. 

Lemma 1. For any function v G Vj+ko> have that 

A%'KjV,7rjv) < T){ko)A^{v,v), 

where rj{ko) = and the constant C is independent of j and the penalty parameter s. 

Lemma 2. Let m and j (> m) be given integers. The following inequality holds for some constant 

8m > 0 ; 

{AmV,v) < [BmV.V^ < (1 + 8m){AmV,v), for all f G Kn. 

Lemma 3. The following spectral equivalence relation holds for all v ^ Vj: 

J—< {Bjv,v) < (^SmV{j - m) + ri{j - m) 

+8r){l) Y 

(T=m+1 

We shall use the following polynomial Pv{t) for the preconditioner: 

i + rdS 
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with 


( 9 ) 


*' > \j(h + l)?(fc))- 

Here is the Chebyshev polynomial of degree v. 

Let a be a small positive parameter satisfying 


sup 


1 




e [a, 1] 


l2 


{i-v&y+{i+v&y 




cr—l 


<7=1 


where the parameter a = - -. 

A:o + l 

We note that such a parameter exists under the above choice of i/ because in this case we have 
the following asymptotic relation: 


T 2 


(i - ^/a)'' + (i + v&y 

(i + Va) 


<T —1 




and for a sufficiently small a we solve the inequality for i/ 


1 1 
< 


i^'^a ari{ko) 


Let Xj be the largest eigenvalue of Aj An upper bound for A^+^o given as follows. 


Lemma 4. 






a 


^0 "h 1 


,1 


(7 = 1 


a 


( 10 ) 


The multilevel preconditioner Bj will be spectrally equivalent to Aj. We summarize these results 
in the following theorem. 


Theorem 2. Let v satisfy the inequality (9) for some given integer parameter A;o > 1. For 
a G (0,1] that is sufficiently small and satisfies the inequality (10), the following spectral relation 
between Bj and Aj holds uniformly for j > 0; 

y- T (Ajn,u) < {Bjv,v) < -{Ajv,v) for all v G Vj. 

Ko + 1 OC 
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COMPUTATIONAL COMPLEXITY OF THE PRECONDITIONER 


To study the computational costs, we denote the degrees of freedom at level j hj Uj. From the 
triangulation process, we may assume that nj^i/rij = 4. Let Ws+i be the number of arithmetic 
operations performed at level {s + l)ko. We then obtain the following recursive formula: 

Ws+i = Cn(s+i)ko + (11) 


where the second term on the right-hand-side stands for v recursive calls of the preconditioner Bsko 
(the polynomial corrections at level sko). Thus, the computation of this action is 


B 


-1 

sko 


I -Pu 


^0 + 1 


^sko 



( 12 ) 


We note that the first term on the right-hand side of (11) stands for the work to invert the 
block-diagonal matrices Df and and v actions of the matrix Asko involved in (12). Thus, in 
general, C is a function of the parameter u and ko {C = C{v, ^o))- To obtain an optimal order 
preconditioner in terms of computational complexity, we choose v and ko such that 


Ws+i < const n(s+i)fco. 


Using (11) recursively, we obtain 

Ws+i = C [n{s+i)ko + I'nsko + • • • + i^^nko) + 

z/. Wo 


= Cn 


(s-l-l)fco 


22fco 


X ^ + T 

/ no 


^0 « r =0 

Hence the condition for an optimal order preconditioner is 

7 / 

< 1 . 


22fco 


22A:o 


This is the constraint for determining the parameters i/ and ko to be an optimal hybrid U-cycle 
multilevel preconditioner. 

In order to make Bj spectrally equivalent to Aj as given in Theorem 2, we need to impose 
another constraint for choosing parameters i/ and ko as follows: 


> yji^o + l)?/(fco). 

Therefore, we establish the following relation for the parameters u and ko to guarantee both the 
optimality and the spectrally equivalent property for the hybrid V-cycle multilevel preconditioner. 
The relation reads as follows: 

2 "^'’ >p> cXoih + l)^- ( 13 ) 
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These relations can always be satisfied because ko can be sufficiently large. We summarize the 
above results in the next theorem. 

Theorem 3. The hybrid V-cycle preconditioner Bj with the parameters specified above gives an 
optimal order CG method if u and ko satisfy the inequalities (13). 

IMPLEMENTATION OE THE PRECONDITIONER 


We first consider the hybrid V-cycle multilevel preconditioner in following matrix form: 

(1) For k = 1, set 


MP) = aP). 


(2) For k = 2 to J, we define 
MP) = 


r >ip) 
^11 

0 

' I 

4p) ^/1(P ’ 
^11 ^12 

^p) 

L ^21 

MP-P 

. 0 

I 


where 


r MP-P = MP-P, + 


(14) 


(MP-P)-i = [/-p4MP-P ^AP-P)AP-P“^] , 

k — sko T 1) "S = 1,2, • • •, cl/fco — !• 

Here, p^{t) is a polynomial of degree u > 1 such that Pi/(0) = 1 and 0 < Pi,{t) < 1, t G (0,1]. 

To solve systems defined hy M = we use the following multilevel iteration (AMLI) from 

[3]. Let pi^\s = 1,2, - ■ •, J, be given polynomials of degree u such that p[,P(0) = 1. Let polynomial 
of degree p — I he 

= (1 -+ • • • + 

For a given vector d = dPl, the AMLI gives 

c(P = Q(P(M(P"'A(P)M(p-'d(P 
= [/ - p^(M(P"'a(P)] AP)"^dP). 

In particular, for the case pj = \ (i.e., p\^\t) = 1 — t), we have simply 

M(P“'d(P = c(P. 
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Algorithm AMLI. Given a set of polynomials 

such that = ,1, we set 

= <lo^ + + • • • + qulit‘'~'^, V = Vs. 

Then, for any vector d = the AMLI gives 

in the following steps. 

(0) initiate 

for k = \ to J set a{k) = 0; 
k = J- 

(1) a{k) - a{k) + 1; 

if a{k) = 1 then 
v(") = 0, W = 

6ls6 

(2) vf) = Aff"Vi; 

(3) d(''-i) = Wa - AgVi; 

(4) k := k — 1 

if & > 0 go to (1); 

(5) solve on the initial level 

v(') = g!,;li(lMWdW; 

(6) set 

v(‘+') = v<‘); 

(7) v<‘«' = vr‘> - 

(8) k := k + 1; 

if (T{k) < Vk go to (1); 

(9) a{k) = 0; 

if < J go to (6); 

(10) c(-^) = d^-^). 
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NUMERICAL RESULTS 


We present numerical results of the hybrid U-cycle multilevel preconditioners for the following 
two-dimensional discontinuous coefficients problem on the unit box O=(0,1) x (0,1). In all 
experiments, the lowest order Raviart-Thomas triangular element is used. We consider the model 
problem given in (2), where the diffusion coefficients are assumed to be piecewise constants on the 
coarsest grid triangles. As a consequence, both local and global elliptic constants or and a are 1. 
In particular, we give the numerical results for the 32-subdomain case with k~^ in each subdomain 
as shown in Figure 1. 



Figure 1: the coefficient A; ^ on each subdomain 
For each preconditioning step, we note that a set of polynomials of degree v = Vg 

+ 9!*^^ + • • ■ + ^ = 1,2, • ■ •, J 

is used in the AMLI algorithm. These polynomials are specified by the following set of positive 
integers: 

which are the degree of the polynomials for each level. Here level 1 and level J(= 6) are the 
coarsest level (ho = 1/4) and finest level (h = 1/128), respectively. We note that i/i is always chosen 
to be one, and that the coarsest grid problem is always solved by the CG method to the machine 
precision Cmach- 

During the preprocessing stage, for A: = 2,3, • • ■, 6, we first estimate the extreme eigenvalues of 
by PCG iterations (the convergence criterion is that the reduction of the energy norm 
for residuals is not less than or equal to 10~®). 
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Suppose that 


C [c,d\, 

for some constant c and d. Then the polynomial is computed by the formula 

Q{k) ^ 


where 


Pi{i) 


Pu{t) = 


1-t; 

l + T^[{c + d-2t)/{d-c)] 


1/ = 2,3, ■ 


l + T,[ic + d)/{d-c)] 

We see that this step can be done by table lookup since i/ is a small number (i/ 6 {2,3}) in practice. 
We refer to the set of polynomials by the set of degrees 


{Vx = 1, 1/2 = I/, i/3 = i/, • • •, i/j = 1} 


We perform numerical experiments for the following ceises 

(a) (1,1,1,1,1,1), 

(b) (1,1,2,1,1,1), 

(c) (1,1,3,1,1,1), 

(d) (1,1,2,1,2,1), 

(e) (1,1,3,1,3,1), 

(f) (1,2,1,2,1,1), 

(g) (1,3,1,3,1,1), 

(h) (1,2,2,2,2,1). 

All experiments were performed in one of the research computing facilities at the National 
Chung Cheng University. The LINPACK benchmark of the machine is about 22 mflops and the 
machine constant Cmach £ (10“^®, 10~^®). We measure the CPU time for both the preprocessing 
stage and the PCG iteration stage. We note that there is no preprocessing time for case (a) since it 
corresponds to the pure U-cycle hierarchical method [8]. 

We perform each case 5 times on the sarne machine and get the average time and condition 
numbers. The results are given in Table 1. We note that the set of polynomials (h) gives the best 
result for the condition number, although it is the most expensive. In addition to case (h), both (d) 
and (e) give very good results for the condition number. Also, most cases are less expensive than 
the pure U-cycle in terms of computing cost. 

In Table 2, we present the results for the U-cycle and case (d). The results show that both cases 
have uniform condition numbers and computing times independent of the penalty parameter e. 
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preprocessing 

PCG iteration 

total time 

K 

(a) 

(1,1,1,1,1,1) 

0 

34.33 

34.33 

82.3 

(b) 

(1,1,2,1,1,1) 

.64 

28.90 

29.54 

45.3 

(c) 

(1,1,3,1,1,1) 

.64 

29.98 

30.62 

39.5 

(d) 

(1,1,2,1,2,1)‘ 

3.66 

25.49 

29.15 

17.5 

(e) 

(1,1,3,1,3,1) 

4.18 

32.03 

36.21 

12.5 

(f) 

(1,2,1,2,1,1) 

1.89 

26.50 

28.39 

28.1 

(g) 

(1,3,1,3,1,1) 

2.65 

33.23 

35.88 

22.9 

(h) 

(1,2,2,2,2,1) 

8.01 

32.09 

40.10 

10.5 


Table 1: computing time and condition number k 



£=.001 

T—1 

o 

o 

o 

II 

t-H 

o 

o 

o 

o 

II 

(vi, U2, Pz, PAi Pb-) P&) 

K 

CPU time 

K 

CPU time 

K 

CPU time 

(1,1,1,1,1,1) 

85.3 

36.37 

84.7 

34.33 

82.9 

34.46 

(1,1,2,1,2,1) 

18.1 

27.91 

17.5 

25.40 

17.8 

26.14 


Table 2: comparisons of and (1,1,2,1)2,1) for various s 


However, the condition number for the case (d) is independent of the number of levels (there are 
currently six levels) while the condition number of the H-cycle does grow with the order 
0{h~^ (nf- [8])- Also, the computing cost for the case (d) is quite small compared to that 

required for the H-cycle. 


CONCLUSIONS 


Based on the idea of a hierarchical block preconditioner proposed by Cai, Goldstein, and Pasciak 
[8], we develop hybrid U-cycle multilevel preconditioners that give relative condition numbers that 
are uniformly bounded with respect to both the penalty parameter s and the number of 
discretization levels J if we choose proper values for ko and u. The numerical results confirm the 
uniform convergence behavior for the hybrid U-cycle multilevel preconditioners, 
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A CONFORMING MULTIGRID METHOD FOR THE PURE 
TRACTION PROBLEM OF LINEAR ELASTICITY: 
MIXED FORMULATION* 


Chang-Ock Lee'll 
Department of Mathematics 
University of Wisconsin-Madison 


SUMMARY 


A multigrid method using conforming P-1 finite element is developed for the two-dimensional 
pure traction boundary value problem of linear elasticity. The convergence is uniform even 
as the material becomes nearly incompressible. A heuristic argument for acceleration of the 
multigrid method is discussed as well. Numerical results with and without this acceleration cls 
well as performance estimates on a parallel computer are included. 


1. INTRODUCTION 


Let be a bounded convex polygonal domain in R^ and dQ = U?=i Ti- The pure traction 
boundary value problem for planar linear elasticity is given in the form 

/ inO, (1) 

gi, 1 < i < n , (2) 

where u denotes the displacement, / the body force, gi the boundary traction, ^ > 0, A > 0 the 

Lame constants, and u is the unit outer normal. In addition, the Lame constants (/i. A) belong 

to the range \pi,fi 2 ] X [Ao,oo), where 1 X 2 , Aq are fixed positive constants. The explanation 
for the notations used in (1) and (2) is given in [4, 6]. 

It is well-known that finite element method using conforming piecewise linear (P-1) finite 
elements converges for moderate fixed A, and as A —>• 00 , i.e., the elastic material becomes 
incompressible, it seems not to converge at all ([1, 10]). In order to overcome this so called 
locking phenomenon, the method of reduced integration was employed by Brenner [4], Falk 

‘This research was partially supported by the National Science Foundation under Grant No. CDA-9024618. 
^Current address: Department of Mathematics, Inha University, Inchon, Korea 


— dN |2/i e (u)-1-A tr (^e(u)J = 

2/t e(M) -f Atr re(u)) PilPi = 
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[6] and Lee [7] in the construction of finite element methods. The finite element methods 
employed by them are robust in A, i.e., they give a uniform convergence rate as A ^ oo. In [4], 
Brenner proved the convergence of the P-1 nonconforming finite element method for the mixed 
formulation and robustness in A using a modification of the space used by Falk in [6]. In [7], Lee 
proved the convergence of the P-1 conforming finite element method for the mixed formulation 
and robustness in A using the same modification of the finite element space as Brenner used in 
[4]. In addition, Brenner adopted the W-cycle full multigrid method as a numerical solver for 
the resulting linear system and obtained the convergence of a multigrid method, which is robust 
in A. For mixed problems without penalty term (e.g. Stokes equation), a W-cycle multigrid 
algorithm was developed by Verfiirth [9]. 

In this paper we present a W-cycle multigrid method to solve the linear system arising from 
P-1 conforming finite element method for the mixed formulation of the pure traction boundary 
value problem developed in [7]. We show that the convergence is uniform with respect to A 
by following the argument adopted by Brenner in [4], While the conforming multigrid method 
has the same order of convergence as the nonconforming multigrid method in [4], the former 
has about one third of the unknowns for the same mesh size. Moreover in the case of parallel 
computation the intergrid transfer operator of the conforming multigrid method is easier to 
design and has smaller communication overhead than the nonconforming one. Therefore, the 
conforming multigrid method promises better performance in the cases of both sequential and 
parallel computations. In addition, we may use this conforming multigrid method as the coarser 
grid correction in the multigrid algorithm for the P-1 nonconforming discretization. It gives 
the same convergence rate and robustness as the conforming multigrid method. In practice, 
V-cycle multigrid methods employing one smoothing step are convergent. Even though the 
P-1 conforming multigrid method is robust with respect to A, the convergence is slow in the 
practical sense. Investigating the relation between eigenvalues and norms of corresponding 
normalized eigenfunctions {u,p) we have found that an unusual bimodal distribution of || u 

vs. the eigenvalues. Based on this insight, we present a heuristic argument for a faster multigrid 
algorithm employing a weighting factor and a daihping factor. Experimental results indicate 
the effectiveness of these two factors. 

This paper is organized as follows. In Section 2 we explain the conforming finite element 
method we employ. Conforming W-cycle multigrid method is discussed in Section 3. In Section 
4 we give the numerical experiments for V-cycle multigrid methods on CM-5. Also we give 
the performance estimate on a parallel computer. In the last section we discuss about the 
acceleration of multigrid algorithm and give numerical results. 


2. THE FINITE ELEMENT METHOD 


Throughout this paper, the letter C denotes a positive constant independent of the Lame 
constants and the mesh parameter hk, which may vary from occurrence to occurrence even 
in the proof of the same theorem. For the notations of several standard differential operators, 
refer to [4, 6]. 
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In order for a solution of (1) and (2) to exist, / and gi must satisfy the compatibility 
condition ^ 

J f • V dxdy + ^ j Qi ■ V ds = 0 V u € RM , (3) 

where RM, the space of rigid motions, is defined by 


RM := : u = {a + by,c - bx), a, 6, c G R| . 

When this compatibility condition holds, the pure traction boundary value problem (1) and (2) 
has a unique solution u G H]_{Q.) where 




(f2) ;= G : J u- v dxdy = 0 V u G Rm| . 


(See [4] or Chapter 3 of [7] for more detail.) Here, k>0, denotes the usual L^-based 

Sobolev spaces of vector-valued functions (See [5]). 

Henceforth, taking T = ^ and p = jdivu, we consider the mixed weak formulation for (1) 
and (2) as follows: 

Find {u,p) e X L^(S2) such that 


/ €{u):€{v)dxdy+ / p{divv)dxdy 

f (div u)qdxdy - f pqdxdy 

Jn ~ j Jn 

for all {v,q) G x Z/^(fi). 


_ 1 _ 

2/x 


= 0 


/ f-vdxdy + Y] gi-v\rids , (4) 
_Jn ~ 7^1 

(5) 


Replacing p and q by ^/up and ^/^Jq (a; > 1), respectively, we obtain the following 
formulation which is equivalent to (4) and (5): 

Find {u,p) G H\{^) x L'^{Q) such that 


((«,;>), ^ J^f-2dxdy + '£j^^gi-v\r, 


ds 


( 6 ) 


for all {v,q) G X L‘^{0.), where 



The quantity w is called the weighting factor. Equation (6) has a unique solution on 
X L‘^{Q). (See [4] for more detail.) 
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Let {T*} be a family of triangulations of Q, where is obtained by connecting the 

midpoints of the edges of the triangles in T*. Let hk := raa.xx^j-k diamT, then hk = 2hk+i. 
Now let us define the conforming finite element space for our multigrid method CMG. 

Wk '•= {h • H linear for all T £T^, u is continuous on , 

Wk ••= l^ueWk-J^li-‘!idxdy = 0 VueRM 

To describe the mixed finite element method, we define 

Qk ■= {g ■ g ^ L^{^) and g\T is a constant for all T € T*} . 

For the definition of nonconforming finite element space, see [4, 6]. 

For each k, define the bilinear form B^^^k on X L^(0) by 




where Pk-i is the L^-orthogonal projection onto Qk-i- Now, we have a conforming 
discretization of (6), which are modifications of one proposed by Falk in [6]: 

Find {uk,Pk) € x Qk-i such that 


Bw,k (i'!ik,Pk) > (H> 9')) = ^ X ~ ~ ^ X ~ 


for all (u, g) e ^ Qk-i- 


(7) 


In Chapter 3 of [7], proving the analogue of the classical lemma for the existence of an 
inverse of the divergence operator Lee showed the uniqueness of the solution of the conforming 
discretization (7) with oj = 1 and derived the following discretization error estimate; 

\\li~ ^1 w - y:k\m(n) + lb ~ PkWi'^xn) 

< Chi ||| / |b2(n) + ^ II 5»llffi/2(r.)| • 

In [4], Brenner showed the uniqueness of the solution of the nonconforming discretization and 
derived a similar discretization error estimate. 


3. THE CONFORMING MULTIGRID ALGORITHM 


In this section we present lemmas and theorems without proofs which are found in Chapter 
4 of [7]. We set w = 1 for the time being until we have a statement for a; > 1. Let B = By and 
Bk = Bi,k. 
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Define the mesh dependent inner product by 


:= {u,u)L^a) + hlip,q)L^(n). 

The intergrid transfer operator X Qk-i — >■ x Qk-i is defined by 

?)>(£>?)) =(fep)-(H.9)) 

\ / k~‘\ ' ' k 

for all {u,p) eWk'X Qfc-i, and {v,q) G ^it-i X Qk-2- 

Lemma 1 l’^~^ : x Qk-i — > X Qk-2 ■ O 

Define Bk-Wk'X Qk-i -^Wk^ Qk-i by 

(^Bk{u,p),{v,q)j^ = Bk ({u,p), {v,q)j V(u,p), {v,q)eWkX Qk-i ■ 

Lemma 2 Bk : Wk X Qk-i -> x Qk-i ■ ^ 


Let B^ — Bk\w±y^Q^_^- 


Lemma 3 The spectral radius of B^ < Chj.^ . □ 


The mesh-dependent norms on x Qk-i are defined as follows; 


Define x Qk-i ^ If x Qk-2 by 

Bk-i (Pk~\u,p), (h.?)) = l^k (^iu,p), (w,g)^ 
for all {u,p) £ x Qk-i and (u,g) £Wk-i^ Qk-2- 

The fc-th level iteration scheme of the conforming multigrid algorithm: The A:-th 
level iteration with initial guess {yo,zo) £ Wk ^ Qk-i yields CMG{k,{yQ^zo)^{'w,r)) as a 

conforming approximate solution to the following problem. 
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Find {y,z) € 1^^ x Qk-i such that 


BkiVjz) = {w,r), where {w,r) £Wi x Qk-i ■ 

For k = 1, CMG{1, (yo, zq), (w, r)) is the solution obtained from a direct method, i.e., 

CMG{1, (yo, 2 ^ 0 ), {w, r)) = (b/-) ^ (w, r). 


For A: > 1, 

Smoothing Step: the approximation € Wj^ x Qk-i is constructed recursively from 

the initial guess (t/o; •^o) and the equations 

(yh^l) = {m-1,21-1) + ■^Bk{{w,r) - Bk{yi-i,zi_i)), l<l <m. 

Here, Afc := is greater than or equal to the spectral radius of Bj^, and m is an 

integer to be determined later. 

Correction Step; The coarser-grid correction in W-^_i x Qk -2 is obtained by applying the 
(k — l)-th level conforming iteration twice. In other words, it is the standard W-cycle 
multigrid method with iJ, = 2. More precisely, 

(HO)9o) = (0,0) and 

(vi, Qi) = CMG{k - 1, (ui-i, g-i-i), {w, f)), i = 1,2 
wher6 (f,f) € X Qk -2 is defined by (f,f) := ((w,?-) - Bk{ym,Zm)\. 


Then 


CMGik, {yo, zo), {w, r)) = (ym,Zm) + (^ 2 , ^ 2 ) • 


Let the final output of the two-grid algorithm be 

{f,z*):={y^,z,r.) + {v*,q*) 

where 

{v*,q*) = [Bt.y it^Bkiy -ym,Z- Zm) . 
Lemma 4 {v *, q*) = {y-ym,z- Zm) ■ □ 
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Let 


Rt:=I-^ (Bkf . 

Then we have 

{y ~~ Vm > ^ ~ ^m) — Rk iy ~ yOi ^ ~ ^o) ) 

{y-y*,z-z*) = {I-Pt^)RT{y-yo,z-zo). 

Lemma 5 (Smoothing Step) There exists a constant C, independent of hk and m, such 
that 

\lRT{^,p)h,k<chf~Uu,p)\io,k y{u,p)ewixQk-i. □ 


Lemma 6 (Approximation Step) There exists a constant C, independent of hk and m, 
such that 


I - Pt'){hP)\kk < ChlUu,p)\h,k y{u,p) G Wt X Qk- 


□ 


Theorem 1 (Convergence of the Two-Grid Algorithm) There exists a constant C, 
independent of k and m, such that 


\liy-y*,z - z*)\lo,k < -^\l{y- yo,z ~ 2o)|||o,fc 


□ 


Theorem 2 (Convergence of the k-th Level Iteration) There exists a constant C, 
independent of k and m, such that 


||l(y, z) - CMG{k, {yo, zo), {w, r))|||o,fc < 


C 


m 


-yo,z-ZQ)lo^k- Gl 


4. EXPERIMENTAL RESULTS 


For our numerical experiments, we choose the model problem studied in [4]: 


-div|e(u) + Atr (^e(u) j 
(|(ri) + Atr (^e{u)^ 


/ in = unit square, 
9i, l<i<4, 
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where F,- (1 < « < 4) represents four sides of the unit square. The body force / = (/i, / 2 )* is 
defined by 


/i = - 7 r^sin 7 rxsin 7 ri/ + 27 r^ + ij cos 7 ra;sin 7 r?/, 

f2 = —7r^cos7ra;cos7r?/ + 27r^ 


and the boundary tractions are defined by 

t 


TV 


— COSTTX, 0 ) , 
A 


9i = 

53 = COSTTX, 0 ^ , 

The exact solution u = (^ 1 ,^ 2 )* S H\{Q) is 


tt 

52 = ITTsin Try,--cosTry , , 


tt 

54 = ITT Sin Try, — ^ cos Try 
A 


ui = I — sin TTx + — cos TTx I Sin Try + 


TT-^ 


U 2 = ( — COS TTX + — Sin TTX ) COS TTy . 

A 


First, we describe the implementation of conforming multigrid method CMG. Let (j)^ be 
the piecewise linear function which equals 1 at exactly one vertex pi and equals 0 at all other 
vertices of T G 7* and be the piecewise constant function which equals 1 on exactly one 
triangle T,- and equals 0 on all other triangles of Tk- Then 

= {(<?i^0,0), (0,^0), (0,0, 


forms a conforming basis of Wjt x Qk-i- The matrix representation of Bk with respect to 
the basis {^*}i<i<nfc in the CMG algorithm is equal to M^^Sk where Mk is the mass matrix 

and Sk is the stiffness matrix. Let be the matrix representation of the intergrid transfer 
operator Then we have 

= M^l.iEtiYMk 

where is the matrix representation of the natural imbedding from x Qk-2 into 

Y{k Qk-1- Let Xk be the vector space which consists of the coefficients of the functions in 
Wk'X Qk-i with respect to the basis {$f}i<i<nfc- Similarly we define as the equivalent 
vector space toWj^ xQk-i- With the compatibility condition (3), the CMG algorithm can be 
rewritten in matrix form for the following problem: 


Find {YiE) € X^ such that 

(Mt^St)\xpY,Z)'=(W,RY, 


where (W, Rf 6 . 



For k — CMG{\, (yo) -^ 0)5 R)) is the solution obtained from a direct method, i.e., 

CMG{1, (yo.yo), = (Mfi5i)|-i(iy,E). 

For A; > 1, 

Smoothing Step: the approximation (Ym, -^m) € is constructed recursively from the initial 
guess (Yo, Yo) € and the equations 

(Y,, Zi) = (Y,_i, Y;_i) + ^M^^Sk{{W, R) - M,-i5fc(Y;_x, Y;_i)), l<l<m. 

Here, A* := is greater than or equal to the spectral radius of and m 

is an integer to be determined later. 

Correction Step: The coarser-grid correction in X^_-^ is obtained by applying the {k — l)-th 
level conforming iteration twice. In other words, it is the standard W-cycle multigrid 
method. More precisely, 

(Yo,Qo) = ( 0 , 0 ) and 

{Vi,Qi) = CMG{k-l,{Vi-uQi-i),{W,R)), i = l,2 
where {W, R) € X^_-^ is defined by 

{W, R) := (^{W, R) - M;;^Sk{Ym,Zm)^ . 


Then 

CMG{k, (Yo, Yo), (W, R)) = (Y„, Y„) + £ti(Y 2 , Q 2 ) • 


With respect to the basis }i<i<nfc the mass matrix Mk has seven entries per row so that 

it is costly to take inverse of Mk in the implementation of the algorithm at each level of the 
multigrid. In practice, we replace Mk by an appropriate Nk satisfying 


(i) Mk and Nk are spectrally equivalent, i.e., there is a constant /3, independent of hk, such 
that 

{Nk U, U)t, 

o</?-^< </? yueXk,u^o. 


(ii) 

(iii) 


{MkU,Uh 
Nj;^Sk : Xt . 

NkMEtiYNk : Xk ^ Xt, . 
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The conditions (ii) and (iii) are essential because the solution of our problem should lie in X^. 
In the smoothing step, instead of A^,, we use h.Nk,k which is the spectral radius of {Nj^^Sk)\x^ 
and by Lemma 3 we have 


Spectral Radius of < Ch^. ^ . 

The multigrid algorithm CMG'is convergent with respect to the norm 

I • i0,fc = •)i2 • 

By slight modification of the proof of the convergence theorem of the CMG algorithm, we 
obtain the convergence theorem of the multigrid algorithm containing Nk instead of Mk with 
respect to {Nk-, •)\!^^ which is equivalent to I • fo,*. See [2] for more detail. For this specific 
experiment on the unit square we take Nk = diag(Mjfc) as suggested in [2], which allows the 
use of an under-relaxed Jacobi scheme of smoothing. Most rows of the stiffness matrix Sk have 
sixteen entries so that most rows of N^^Sk have again sixteen entries. Note that the matrix 
representation for / has again seven entries per row. In the coarsest grid we use a direct 
solver for the (6 x 6) linear system which comes from the matrix representation with respect 
to the basis of 

The performance of multigrid algorithms has usually been measured in Work Units. In 
serial machines, since the total CPU time is proportional to the amount of computational work 
and smoothing steps make up most of the multigrid work, a reasonable unit of effort is the 
Work Unit (WU) defined in [3] as the amount of computations in one smoothing step in the 
finest grid. 


However, in parallel machines (in particular, massively parallel machines adopting data 
parallelism) we use a somewhat different method to measure the computational work. In this 
paper, we use one WU as the amount of computations needed in one smoothing step of the 
conforming multigrid method CMG at the finest grid on a serial machine. Let nk be the 
number of unknowns at /:-th level and Qcomp be the number of operations required to compute 
one smoothing step at each mesh point. Then we have 

‘^jQcom.p — 1 (W^U) 


where J-th level represents the finest grid. Let p be the number of processors and assume 
two-dimensional square data distribution (cf. Chapter 5 of [7]). Then the number of unknowns 
of A;-th level allocated to each processor is 


rk = 


P 


and rifc = ( 7 ) -nj 
V4/ 


for k = 1, 




where [x] is the smallest integer greater than x. On a parallel machine we need an additional 
unit to measure the communication work. We define one CU {Communication Unit) as the 
amount of communications needed in one smoothing step of the conforming multigrid method 
CMG when we assume a large system of p-nj number of unknowns. Let Qcomm be the number 
of interprocessor communication steps required to compute one smoothing step at each mesh 
point. Since about mesh points in a processor do interprocessor communication in the 
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Table I: V-cycle of CMG when h = 1/64 



A = 10 

A = 100 

> 

/ 

^ = 1000 

smoothing 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

1 

68 

582 

1626 

244 

2073 

5788 

334 

2842 

7935 

2 

67 

, 572 

798 

223 

1894 

2645 

293 

2491 

3478 

3 

66 

559 

520 

201 

1714 

1595 

255 

2169 

2019 

4 

64 

546 

381 

184 

1564 

1092 

226 

1924 

1343 


smoothing step of the conforming multigrid method CMG, we have 

‘iy/njQ 

comm — 1 (CU). 

Let Tcomp be the time needed to perform the computational work of one smoothing step at 
one mesh point and Tcomm be the time needed to perform the interprocessor communication 
in one smoothing step at one mesh point. The multigrid algorithms in this paper are one-sided 
method, i.e., it uses the smoothing step before correction step. If smoothing steps are used 
before and after correction step, the multigrid method is called symmetric. Note that as far 
as the convergence is concerned a symmetric V-cycle multigrid iteration is the same as two 
one-sided V-cycle iterations (See [8]). 

The programs execute the multigrid iterations until the discrete L 2 relative error is less than 
.03 for the mesh size h = ^ (10,498 unknowns) and for various number of smoothings and A. 
The experiments reported here were run in double-precision arithmetic on CM-5 Vector Units 
with 32 processors. 

In the Table I, the numbers in the columns of A = 10,100,1000 represent Work Units, 
Communication Units and Niter (the number of iterations of CMG). While we have only 
proven that CMG converges for the W-cycle with many smoothing steps, we see that in practice 
it converges even for the V-cycle with one smoothing step. In both cases, convergence is 
independent of the mesh size hk and Lame constant A. The total amount of computational 
work of a 7-level V-cycle is 


^Ncomp — ^ 


/ 7 

E 


k=2 


(i) 


7-k 


nr 


P 


N 


iter 


n? 


The total amount of communication of the 7-level V-cycle is 


Wc, 


m 


fr 


( 1 ) 


7-k 


nr 


Niter 

y/n7 
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The total elapsed time is 


T — y^compTcomp “h comrn^comm ■ 

Therefore the performance of the multigrid algorithm is dependent upon the ratio between Tcomp 
and Tcomm ■ It is not easy to obtain the ratio because it heavily depends on the implementation 
of algorithms, e.g., the topology of data distribution and distance of communications. 


5. ACCELERATION OF MULTIGRID METHOD 


Even though the P-1 conforming multigrid method is robust with respect to A, the 
convergence is slow in any practical sense. In this section we present a heuristic argument 
for the acceleration of the multigrid algorithm CMG. 

Replacing p and q by ■y/up and y/uq {u > 1), respectively, we use the argument in Chapter 
3 of [7] to show the uniqueness of the solution of the equations (6) and (7), and to derive the 
following discretization error estimate: 

IIH + life ^1H “ 2^^:lHRn) + V^\\p - Pk\\L‘^{n) 

< Cuhl ||1 / ^ II 5iHii-i/2(ri) 

I ~ ~ i=i ~ ~ 

Also, following the argument in Section 3, we can develop the same multigrid algorithm for the 
problem: 

Find (y, z) G W-j^ x Qk-i such that 



^ Iffc X Qk-i ■ 


For positive definite systems of which energy norms are equivalent to norm, the 
normalized eigenfunctions (with respect to norm) corresponding to the large eigenvalues 
have large norm, which means that these eigenfunctions are highly oscillatory. However 
our linear system induced from the mixed finite element discretization of the pure traction 
problem is indefinite. Moreover, the solution space is composed of two different spaces. 
One is the space of piecewise linear functions and another is the space of piecewise constant 
functions. Using MATLAB we have investigated the relation between eigenvalues and || u ||;yi 

and ||[p ]||£,2 of normalized eigenfunctions {u,p) (with respect to ||| • |||o,A;) where [p] represents 
the jump across the edges of each triangle in T^-i. Figure 1 shows the eigenvalues and || u 

and ||[p]||l 2 of normalized eigenfunctions of N^^Sk where h = 1/16 (706 unknowns). The 
eigenvectors corresponding to the negative eigenvalues have large ||[p]||l2, which means p is 
highly oscillating, so that the error of p corresponding to the negative eigenvalues is not 
reduced by smoothing step enough to be corrected in the correction step. By introducing the 
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weighting factor, we can magnify the size of the negative eigenvalues with little effect on the 
general distribution of eigenvalues. Figure 2 shows the eigenvalues and || u ||;yi and ||[pw]||i ,2 of 

normalized eigenfunctions of N^^Suj,k with weighting uj = 7. By employing such a weighting 
factor the magnitudes of negative eigenvalues become larger while that of positive eigenvalues 
grow little. Therefore we expect the better performance of multigrid method for the system 
with the weighting factor. 

Since we use the Gershgorin theorem to estimate the maximum eigenvalue of we 

always over-estimate it. Therefore for acceleration of our multigrid algorithm, it is useful to 
use damping factor 6 in the smoothing step as follows: 

02 / \ 

{yi,zi) = - Bu,,kiyi-i,zi-i)j , l < f < m. 

There is one more reason why the damping factor is useful. In Figures 1 and 2, there are 
two or three peaks of || H/^i, which means that the error of u corresponding to mid-ranged 

positive eigenvalues is not reduced by smoothing step enough to be corrected in the correction 
step. By using a damping factor the error of u corresponding to several peaks can be reduced 

simultaneously in the smoothing step. Numerical results for the effect of the weighting and 
damping factors are shown in Tables II-IV. However, as ^ 2, the multigrid algorithm is 

suddenly divergent so that it is risky to take 0 « 2 in order to get better convergence results. 
Tables V-VII show the convergence results with 2 smoothings with 0 = 1 for the first smoothing 
and 0 = a for the second smoothing. By the alternating smoothings, we can take 0 near 2 in 
safe. Using these weighting and damping factors, we get about 30 times faster results. 

Acknowledgements. I would like to thank Professor S. V. Parter for his valuable advice 
and encouragement. 
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Table II: V-cycle of CMG with one smoothing, A = 10 and h = 1/64 



LJ = 1 

u; = 3 

w = 4 

w = 10 


wu 




CU 




iter 

WU 

CU 

iter 

iEI 

68 

582 

1626 

13 

111 

310 

mm 

MfUM 


19 

160 

446 

IQ 



1171 

9 

78 

217 

9 


217 

13 

Da 

309 

rail 

38 


907 

7 

58 

161 

7 

wm 

160 

IQQ 

El 

on 

IQ 

32 


758 

5 

45 

126 

irai 

44 


irai 

62 

HQ 

1.8 

divergent 


37 

103 

4 


97 

6 



2.0 

divergent 

divergent 

3 

29 

_ 

81 

_ 

5 

39 

110 


Table III: V-cycIe of CMG with one smoothing, A = 100 and h = 1/64 



U) = 1 

a> = 6 

u = 7 

u; = 10 

0 

wu 

CU 


WU 

CU 

iter 

WU 





iter 

1.0 

244 

2073 


16 

136 

380 

16 

139 

Qi 

18 

157 

439 

1.2 

210 

1788 

4992 

im 



mm 


HQ 

IQQ 

109 

304 

1.4 



8381 

irai 

73 


8 


QJI 

IHI 

■rail 

223 

1.6 

c 

ivergent 

7 

41 

HQ 

7 

57 

lECl 

7 

■Sil 

HQ 

IQI 

divergent 

11 

90 

msm 

6 

53 

147 



IQI 

El 

divergent 

divergent 

divergent 



Table IV: V-cyde of CMG with one smoothing, A = 1000 and h ==1/64 


ihhi 

a; == 1 

u = 7 

w = 8 

a; = 10 

0 

wu 


iter 



iter 

WU 

CU 

iter 

WU 

CU 

iter 

1.0 

334 

2841 

7935 

IQQ 

144 

Ell 

17 

146 

408 

19 


440 

1.2 

336 

2855 

7972 

IQQ 

101 

1^11 

12 

102 

285 

13 

■BE! 

305 

1.4 

c 

ivergent 

9 

78. 

217 

9 

76 

213 

9 - 

80 

224 

1.6 

divergent 

8 

67 

187 

7 

62 

173 

7 

62 

173 

1.8 

divergent 

divergent 

10 

88 

247 

6 

53 


2.0 

divergent 

divergent 

divergent 

divergent 
































































































































































Table V: V-cycle of CMG with alternating smoothings, A = 10 and h= 1/64 



w = 1 

w = 3 

ct) = 4 

a; = 10 

a 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

■nil 

iter 

1.0 

67 

572 

798 

13 

112 

156 

13 

113 

158 

19 

160 

224 

1.2 

56 

473 

661 

11 

92 

128 

11 

92 

129 

15 

131 

183 

mi 

47 

397 

555 

9 

76 

106 

9 

77 

107 

13 

108 

151 

1.6 

40 

340 

475 

7 

64 

89 

7 

64 

89 

11 

90 

126 

1.8 

35 

298 

416 

6 

54 

76 

6 

54 

75 

9 

76 

106 

2.0 

32 

271 

378 

6 

47 

66 

5 

46 

64 

8 

64 

90 


Table VI: V-cycle of CMG with alternating smoothings, A = 100 and h= 1/64 



u = 1 

uj = 6 

w = 7 

a; = 10 

a 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

1.0 

223 

1894 

2645 

16 

136 

190 

16 

138 

193 

18 

158 

220 

MKM 

191 

1620 

2263 

13 

112 

156 

13 

114 

159 

15 

129 

180 

mi 

172 

1461 

2040 

11 

93 

130 

11 

94 

131 

12 

106 

148 

1.6 

170 

1445 

2017 

9 

79 

no 

9 

79 

no 

10 

88 

123 

1.8 

232 

1977 

2761 

8 

69 

96 

8 

67 

94 

9 

74 

103 

2.0 

c 

ivergent 

8 

64 

90 

7 

59 

83 

7 

63 

88 


Table VII: V-cycle of CMG with alternating smoothings, A = 1000 and h= 1/64 



(jj = 1 

uj = 7 

w = 8 

o) = 10 

a 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

1.0 

293 

2491 

3478 

17 

143 

199 

17 

146 

204 

19 

158 

220 

1.2 

255 

2171 

3032 

14 

ng 

164 

14 

120 

167 

15 

129 

180 

1.4 

241 

2049 

2861 

12 

98 

137 

12 

100 

139 

12 

106 

148 

1.6 

271 

2308 

3223 

10 

83 

116 

10 

83 

116 

10 

89 

124 

1.8 

c 

ivergent 

9 

74 

103 

8 

72 

100 

9 

74 

104 

2.0 

divergent 

9 

73 

102 

8 

65 

91 

8 

64 

90 
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MULTIPLE SCALE SIMULATION FOR 
TRANSITIONAL AND TURBULENT FLOW 

Chaoqun Liu* and Zhining Liu^ 

Numerical Simulation Group, Department of Mathematics 
University of Colorado at Denver 
Denver, CO 


SUMMARY 

A new concept, multiple scale simulation (MSS), is presented in this paper. The 
basic idea is that the flow is decomposed into several component groups according 
to spatial and temporal length scales. Each group has its own subdomain, govern¬ 
ing system, mesh size, and discretization method. The simulation is then performed 
groupwise. This approach has been successfully applied in combination with the in¬ 
tergrid dissipation technique for simulation of transitional and turbulent flow in 3-D 
boundary layers, and it is feasible for 3-D airfoils and other more complex conflgu- 
rations. MSS should prove to ameliorate the scale problems associated with conven¬ 
tional direct numerical simulation. 


INTRODUCTION 

The main challenge in direct numerical simulation (DNS) is the demand on com¬ 
puter resources. Transitional and turbulent flows contain a wide range of length 
scales, bounded above by the geometric dimension of the flow field and bounded be¬ 
low by the dissipative action of the molecular viscosity (Canuto et al, 1988). The 
ratio of the macroscopic (largest) length scale L to the microscopic (smallest) length 

3 

I (usually called Kolmogorov scale) is Ljl = {Re)^, where Re is the Reynolds num- 
ber. Thus, for a 3-D problem, the number of grid points, N, -must be on the order 

g 

of {ReY if the Kolmogorov scale is to be resolved. This estimate reveals a funda¬ 
mental difficulty with DNS for large Reynolds number flows because this resolution 
requirement is far beyond the capability of current or foreseeable supercomputers. 
However, this estimate is made based on a single simulation on a single grid and 

‘Staff Scientist and Associate Professor, Applied Mathematics. 

1 Assistant Professor Adjunct, Applied Mathematics. 
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Figure 1. Idealized sketch of transition process on a flat plate. 


is, therefore, too pessimistic. Note that the length scales involved in transition and 
turbulence processes are very different; for an open flow, in general, the main stream 
and the linear growth of inflow disturbance are dominated by large scales that dom¬ 
inate a large part of the flow field domain: small scales generally occur only in and 
after breakdown areas. Extreme!}^ small scales are only meaningful in a narrow area 
nearby the solid wall. These observations provide a clue that the total flow may be 
effectively decomposed into several groups based on their length scales. The large 
scale flow, dominating most of the flow field, can be simulated by conventional CFD 
schemes on relatively coarse grids. For small scale flow phenomena, which plays an 
important role onlj^ in a small area of the flow field, high-order discretization and 
very fine grids have to be used. These small scale simulations may be performed on 
several grid levels in which each grid has its own subdomain and governing system. 
This idea eventually leads to a multiple scale simulation (AdSS) on several levels of 
grids. Unlike large edd}^ simulation (Reynolds, 1990), the MSS approach does not 
require subgrid models. A basic description of MSS and its performance for CFD 
problems with' simple configuration is the subject of this paper. 

ABSTRACT FLOW DECOMPOSITION EXAMPLE 

Here we consider the flat plate boundary layer flow as an example to describe 
the basic idea behind multiple scale simulation. Figure 1 depicts the natural flow 
transition process in a 3-D boundary layer, showing clearly the variations in flow 
regime scales. 
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Using the fact that the flow scale of interest is generally large in the free stream 
and the area before breakdown (Figure 1), we can consider the use of multiple levels of 
grid to resolve the flow. Figure 2 depicts the case of three levels used in our boundary 
layer example; Vtj represents the domain that level j is used to resolve, with the whole 
computational domain given by 

o = Oj u O2 u 03. 





Figure 2. Multiple level grids. 


To decompose the total flow according to those levels, suppose the physics is 
governed by the time-dependent Navier-Stokes equations, which we write as 

— ^LV = F in 0, 
ot 

V = U -\- U' at inflow. ( 1 ) 

Here, we also decompose the inflow vector into two components (usually, U is the 
steady part with large magnitude, and U' is the unsteady perturbation part with 
relatively small magnitude). We then decompose the total flow field into three com¬ 
ponents according to 

u = u + U2 + U3, (2) 

where Vi, ¥ 2 , and V 3 represent increasingly more local and finer scales of the flow so 
that 


U2 = 0 in 0-^2, 

U 3 = 0 in H — 0 , 3 . (3) 
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To define individual governing systems for each component, first consider the 
subdomain Oi, on which we impose the system 


at 

Vi = U at inflow. ( 4 ) 

Here, is the spatial difference operator in Hi. In general, Fi ^ F can be chosen 
with some freedom to represent large scale physics, so that Vi represents the large 
scale flow without the inflow disturbance. Thus, (4) can generally be solved by low 
order schemes on a coarse grid. For subdomain ^ 2 , we consider the governing system 

BV 

V2 = U' at inflow. ( 5 ) 

Here, represents some interpolation operator to transfer between Hi and 0 ,2- Note 
that V2 represents the perturbation in the flow field due to the inflow disturbance U' 
and the presumably finer scale source term F — F^. V 2 has a much smaller scale than 
does Vi and should be solved by a high-order scheme on a fairly fine middle 

scale grid. For subdomain H 3 , which we choose to be a small part of the flow domain, 
the governing system can be written as 


dV^ 

dt 



= 0 


in H3, 
on 80,3. (6) 


Hs’s physical scale is considered to be ver}^ small so that ( 6 ) should be resolved on an 
extremely fine grid. 


Note that (4)-(6) together with the decomposition (2) represent a consistent 
“lower triangular” formulation that is equivalent to ( 1 ) but lends itself to individ¬ 
ualized treatment of various physical scales in the discretization. Its triangular form 
allows for a simplified solution process; first (4) is solved to determine Id, then 
(5) is solved for Id, then ( 6 ) is solved for Id, with the final result then given by 

V = Vi + Id + Vs- 


APPLICATION TO POISSON EQUATION 


The idea of multiple scale simulation as described allows for any desired number of 
levels, depending on available computer resources and given accuracy requirements. 
To see the basic process more clearly, we first use a 1-D Poisson equation as an 
example: 


dx’^ 


X e ( 0 , 1 ) 


<b(0) = </>(!) = 0. 


(7) 
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This problem has the analytical solution 


(f){x) — 2x(l — x). 


Using standard central differences for discretization, 

4’i+l ~ ‘^4‘i + ^i-1 _ r 

h2 - U, 

and three levels (f2i, ^25 U3), we obtain the numerical solution at selected points: 


in Cli 
in O2 
in U3 


01.+! - 2(^1, + 

.52 

4’2i^i — 2 ^ 2 ; + <^ 2,_1 

.252 

4>3, + i — 2 (?!) 3 . + (/> 3,_1 

.1252 


where 4>i = 


-4, 

-A - ~ 4- s 

^ .252 _ 

_4 _ + ^i.-i 

^ .1252 

^2 = 



‘^4>2, + ^2._1 ^ 
.1252 


Letting (jP'\ and cjP'’ denote the final solution at grid levels 1, 2, and 3, we 
obtain the results as shown in Table 1. Obviously, the more the grid levels, the better 
are the results. 


solution 

01 

0O 

02 

■0f2l 

03 

0U) 

analytical 

«0) 

0 

0 

0 

0 

0 

0 

0 

c/.(0.125) 


0.125 


0.1875 

0.03125 

0.21875 

0.21875 

<^(0.25) 


0.25 

0.125 

0.375 

0.0 

0.375 

0.375 

(^(0.375) 


0.375 


0.4375 

0.03125 

0.46875 

0.46875 

(/>(0.5) 

0.5 

0.5 

0 

0.5 

0.0 

0.5 

0.5 

(^(0.625) 


0.375 


0.4375 

0.03125 

0.46875 

0.46875 

0(0.75) 


0.25 

0.125 

0.375 

0.0 

0.375 

0.375 

0(0.875) 


0.125 


0.1875 

0.03125 

0.21875 

0.21875 

•#(1) 

0 

0 

0 

0 

0 

0 

0 


Table 1. Comparison of the numerical solution with three grid levels 
and the analytical solution for Poisson equation. 


This simple example illustrates the basic idea underlying MSS, and it suggests 
that it might provide a very efficient way to performing DNS for very complex con¬ 
figurations. 


FLAT PLATE PROTOTYPE 

In this section, we consider spatial flat plate transitional flow as an example to 
illustrate our approach. 
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Large Scale Simulation (V’l) 


The go^'el■ning equation for the base Hoav is governed b}' the Navier-Stokes equa¬ 
tions. Srypose there is no mass transfer on the flat plate, and gravity is negligible, 
so that F = 0. The governing equations can then be written as follows: 



dt 


+ (U • v)^ + vP 


V ■ V'l 


= 0 . 


( 8 ) 


For steady flat plate flow, the Blasius solution can be assumed for the large scale 
global component: 


where 




nfin) - 




(9) 


u is the kinetic viscosity coefficient, and / can be found in any textbook on boundary 
layer theory (e.g., Schlichting. 1968). 


Middle Scale Simulation (V 2 ) 

These scales can be determined at inflow for the so-called spatial approach. The 
governing system is 

8V 

-— + L\Vr + V2) = L'/SVi 

V2 — U'{t) at inflow. (10) 

Considering I^lVi = (i^i, ui, rt’i, Pi) as known, and using as the coordi¬ 

nate system on fii, and (,'C 2 i i/ 2 , ^ 2 , ^ 2 ) as the coordinate system on Vt2i then we can 
write the scalar x-component equation for V 2 = (^ 2 , ^ 2 , w’ 2 , P 2 ) as 

du2 d{ui + U2){ui + U 2 ) d{ui -t- U2){vi + V 2 ) d{ui + n2)(^"i + W 2 ) 

dx2 ^ dy2 ^ dz2 

d{Pi + P2) _ 1 + U2) d'^{ui -h U2) d^{ui -b ^2) 1 

dx2 Re dxl d'yl dz\ 

diuxux) djuivi) d{uiwi) 1 d^ui d^ui dPi 

dxx dyx dzx Re dx\ dy\ dz\ dxx 
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Similarly, the y— and z—momentum equations and the continuity equation are: 

dv2 d{ui + U2){vi + V 2 ) d{vi + V2){vi + V 2 ) d{vi + U2)('U;i + W 2 ) 
_ + _ + 


d{Pi + P2) 1 rd^{vi + V2) d^{vi+V2) d'^{vi+V2)-. 

^ ^ ^ _l_ ^ ^ J 


dy 2 Re ^ dx 2 

d{viui) d{vivi) d{viwi) 

I r\ ~l 


dyl 


dzi 


1 


, dPi 

f~ r\ O i O J ^ 


dxi ' dyi ' dzi' Re''dx\ dyj dzf^ ' dyi 
dw2 d{ui + U2){wi + W 2 ) d{vi + U2)(u;i + •^ 2 ) g(u;i + W2){wi + W 2 ) 

dt 2 dx 2 dy 2 dz 2 

1 d'^{wiPW2) &^{wiPW2) d'^{wi p W2) ., 

Re dxl dyl dz\ 


( 12 ) 


a(Pi 


dz 2 


d{uiwi) d{viwi) d{wiwi) 


Re^ dxl 
du 2 dv 2 dw 2 
dx 2 dy 2 dz 2 


'^i /2 

1 ,d'^wx d'^wi d'^wi- 

+ -77^ + 


dy'^ 


dz\ 


+ 


1 

,du\ dvi 

-—- ^- - + 

5xi dy\ dzi 


dPi 

dzi 

dwi, 


(13) 

(14) 


Since linear growth and secondary instability are present, V 2 contains a wide range of 
differing length scales, some of them rather small. We thus need to use a high-order 
difference scheme on relatively fine grids. For our purposes, a fourth-order central 
difference scheme on a staggered grid of resolution h = 0(0.lA) is used, where A is 
the so-called T-S wavelength. 

For a generic partial differential equation. 




s, ( 15 ) 

we use a second (or higher)-order backward Euler difference in the time direction: 

+ 0(Af2). (16) 


d4' + C 


n—1 

ijk 


dt 


2 At 


Letting Lcf) = where Lh is the spatial discretization of L described below, 

yields a fully implicit time-stepping scheme. This has much better stability than the 
explicit scheme and is much more efficient for representing the nonlinear N-S system. 
However, it requires solving a large algebraic system at each time step for which we 
have developed a multigrid algorithm based on so-called line-distributive relaxation 
(Liu L Liu, 1993). Only one multigrid V-cycle is usually needed to solve this large 
system, making each implicit time step comparable in CPU cost to a few steps of the 
corresponding explicit scheme. 

To minimize numerical viscosity and phase error, fourth-order central differences 
(under staggered grid frame) in space is applied: 

d(f}. _ — 
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dcj). 


"4*1+2 + ^^4*1+1 ~ 30(^1 + iQ4*i 


4> 


'i-2 


12Ax2 

"4*1+2 + ‘^74i+l — ‘^T4i + 4i-l 


24Ax 


(17) 


Figure 3 depicts the stencil of the discretized x—momentum equation for the interior 
grid points. (For simplicity, we drop the subscript “2” in Figures 3 and 4.) 



'^ij-\-2 k‘ 

‘ ^ii +2 k 



'^ij-j-1 k‘ 

‘ k 


'^i—2 jk 

o — 

Pi—2 jk 

— 1 jk '^ijk 1 
^ o — 

Pi-1 jk 

' '^ijk 

o —j 

Pijk 

jk^ 

Pi-\-l jk 


'^ij — l k> 

‘ 1 k 



^ij — 2 k- 

‘ '^'ij—2 k 



Figure 3. Neighbor points for the x—momentum equation in the (x, y) plane. 

Since a staggered grid is used when we discretize the x—momentum equation, we 
need to evaluate v at the points associated with u where we have no definition for v. 
This we do by high-order interpolation (Figure 4): 

Vijk = [9('*^p'fc + *-’ij+l k + ^^t -1 jk + ^^i-l j +1 k) 

— (u ,_2 j-\ k + 'k>i -2 3+2 k + Vi+i j_i k + 'Wt+l j +2 fc)]/32. (18) 


Vi-: 

i-|-2 k 

Vi- 

. i+1 k 

Vi 

i+1 k 

Vi+. 

j+2 k 




Vijk 



Vi-: 

2 j-1 k 

_ 

Vi- 

- jk 

Vi 

jk 

Vi+ 

1 i-1 k 

_ 


Figure 4. Fourth-order approximation for Vijk at Uijk point. 
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Small Scale Simulation (V 3 ) and Intergrid Dissipation 


The subdomain Da that supports V 3 includes the transition zones and near wall 
areas that exhibit very small length scales corresponding to vortex breakdown and 
transition processes. Very fine grids must therefore be used to resolve these scales. 
Fortunately, the task that this represents is substantially reduced by the small size 
of fia (and, perhaps, the fact that the boundary conditions for I 4 are homogeneous 
Dirichlet). 

Let // denote the interpolation operator for the variables transferring from fl, to 
rij, and define 

uo = + U 2 ), Vo = Ilillvi + U 2 ), 

Wo = Il{llw, + W 2 ), Po =+ P 2 ). 


Furthermore, since the time scale in Oa is also much smaller, we need to obtain the 
variables at local time t. For example. 


Uo{i) = 


“ 0 (^ 2 ) • (^2 - t) + “ 0 (^ 2 ) • A) 


to — t\ 


where t\ and are two time levels in ^ 2 - Then the resulting governing system for V 3 
can be written as 


duz 

dU 


+ "x—[(2uo + ua)ua] + —[(uq + ua)^^3] + — {v^uo) 
0 x 3 01/3 01/3 


■^[{wo + u)a)ua] + —(loa^o) 

OZ 3 OZ 3 


1 ^d'^U3 , ^^ua , d'^U3^ ^ 5Pa 


Re^ dxl dyl 


^duo , duouo I duoVo ^ duoWo 1 ^d^uo ^ d^uo 


dzi 

d^uo 


dx 2 


dy: 

dv3 _ 

dt 3 8 x 3 


dz 2 


Re^ dxo 


) + 


8 x 3 
8P0, 


8 zo 8 x 


2 8y2 u^2 

8 8 8 

[(uo + U3)ua] + + ^[(2uo + U3)'y3] 

5 r, , ^ 1 , 5 ^ 1 f8'^V3 , 8'^V3 , 8'^V3^ , 8P3 

+ U,3)«3] + ^('»3«o) - ^ 


8vo 8uoVo 8voVo 8woVo 1 .8'^ Vo 8‘^Vo 8'^ Vo. 8P0. 

T dTV 0„2 ' Q..2 0-2 / ' 1 ’ 


■ 8 t 


8x2 
8 w 3 


8 z 2 


Re^ 8 x 1 8 yl 8 zl’ 8 y 2 


8 y 2 


8wo 8uoWo 8voWo 8woWq 

T To T 


1 ,8'^W3 5T03 8‘^W3 8P3 

'V n „2 0..2 a -.2 / I" 


Re^ 8 x 1 


8yl 


8 t 2 


8 xn 


8 y 2 


8 z 2 

8 u 3 


1 ,8 '^Wo 8‘^Wo 

( Cl T T Cl O T 


Re^ 8 x'^ 


8 y^ 


8 zi 

8 ^Wo 


) + 


_ 8 w 3 

8x3 8y3 8 z3 


8 z 3 
8P0 
8z ’ 
8wo 


t 8 uQ 8 ^ 

8 x 2 8 y 2 8 z 2 




( 19 ) 


( 20 ) 


( 21 ) 

( 22 ) 
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The basic approach we use in Q 3 here is the same as we use in ^ 2 - The grids are 
now much finer, though not yet fine enough to resolve the Kolmogorov scale. Since 
the central difference scheme is nondissipative, trouble occurs in the breakdown stage 
where the shear layer de\'elops and the large \'ortices decompose into small scale ones. 
The numerical simulation will thus have a huge energy burst, the disturbance velocity 
will be amplified by se\'eral orders of magnitude somewhere inside the flow field, and 
the computation then goes unstable. These nonphysical phenomena occur because 
our scheme is nondissipative. and the grid size is not small enough to represent the 
dissipative small vortices. 

The recently developed technique of intergrid dissipation (Liu k Liu. 1994b) can 
be used to provide the dissipation contributed by small vortices without distortion 
of the physical solution. We describe this process as follows. .4t each time step, we 
make the replacement 

Here, the scripts h and 2 h indicate the respective fine and coarse grid approximations, 
I'l’'- and refer to respecti\'e restriction and interpolation, and a is a djmamic weight 
factor. In ffa. we choose 


a = 


A.r3Ay3A~; 

Ats 


(V'3 ■ V 3 ), 


(23) 


where A.r’ 3 , Ay 3 , and A .:3 are the local spacing in the .r-. y-, and .^-directions, 
and Ats is the local time step. 


Numerical Test 


For the actual computation, a stretched grid that becomes much denser near the 
solid wall is used. Consider the transformation 


X = 

y = 

z — 

J = 


y(''7) = 


ymax^^] 


ymax^ J/max('^7i 


tiY 


c 

d{x,y,zy 


Vy^ 


and 


U = yrjU, 
V = V, 

W = y^w, 
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where ymax is the height of the computational domain in the physical coordinate y, 
r]max is the height of the computational domain in the computational coordinate 77 , 
and (7 is a constant that can be used to adjust the concentration of grid points. We 
can then write the contravariant based governing equations on fis as follows: 


dU: 


" + + u,)Us] + 

dr) dr)^ y^ ' 


dt ' yr,d^ 

LI 

VvdC 
1 .d^Us 


Id dP 

+-wt[(Wo + Ws)Us] + --§^[WsUo] + 

3. , d Ms. . dMs. 


Re*o ' ap 


1 U^U3 


,du„ , idUM , d,UoVo, , 1 , ap„ 

— -I-W 7 -h ^ - H- -^[UoWq] + yr,-^ 

dt y^ d^ dy y^ y^ d^ 


1 


MUo , 1 d^ , d Mo, , dMo,, 


Rei^ dp 

vM + 4l(Co + U,)v,] + ^(U^V„) + |-[( 2 V„ + V,)V,] 


dt 

d 


d^' 


d dP 

+ ^[(Wo + WsWs] + ^(WsVo) + 


dr) 


1 


, dMz , 1 dM3 , dV3 , dM3, 
“— sra“ + VriVyy-a - Vv-Etprl 


de y, dy^ 


dr) 


dc 


, dVo , dUoVo , dVoVo , dWoVo , dPo 

“‘''’"aT + ^ ^ 

, a“v„ , 1 a^Vo , dv„ , 

+ VvVyy^r + y^^TrJ)’ 


1 


i?e 5 

5 W 3 


5^2 ' y, 5772 ' dy ' dC 
, 1 d , 1 , 5,(l/o + y3)W3, 

— s, —' 

d WWn 1 d dP-, 

^ ^ ^^~dC 


1 

L ^ >0 ' 


3 , 1 52 ,W 3 , , d ,W 3 . d^W3 


Re*o' de 
.dWo 




Vv 

iMll + LLLLL + L.L 2 LL] + LL[wm + y — 


dy Vn 
-] + 


1 


,aWo , 1 d^ , d , d^Wo,, 


Re-pep v,ap^y. 


dUs dV 3 dW 3 

af a, ar 


,dUo a Vo awo, 

' a? an ec -*' 


(24) 


(25) 


(26) 

(27) 


For the details about discretization of the above system, see Liu Sz Liu (1994a). 

To investigate the efficiency of our MSS approach, we choose to investigate the 
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secondary instability case with Re^ = 900. As above, we use only three levels to 
describe the flow. A 130 x 18 x 10 grid is employed for both Oi and O2, which 
includes a 7 T-S wavelength physical domain and a 1 T-S wavelength buffer (Liu k 
Liu, 1993); a 42 x 18 x 18 patch is used for O 3 . The patch covers the downstream half 
of the flat plate except for the buffer domain. The stretch parameter is tr = 3.75. 

^ As mentioned above, the Blasius similarity solution is employed as the base flow 
(Id), which is widely used as the base flow for flat plate transition. A Benney-Lin 
type disturbance (Benney k Lin, 1960), 

= Real{e2d4?(y)e-*‘^=^' + 

is imposed on the inflow to generate V 2 . Here, (l) 2 d{y) and (f) 3 d±{y) correspond, re¬ 
spectively, to 2-D and 3-D eigensolutions of the Orr-Sommerfeld equation, and the 
superscript (k) denotes different velocity components. Other parameters used in this 
work are as follows: 


Rel = 

900, Fr = 86 {u 2 d = wsrf 

P = 

b-l, ymax — ymax — 50, 

C>i 2 d = 

0.2229 - 0.0045H, 

Ot^d = 

0.2169 - 0.00419i, 

^ 2 d = 

0.03, e 3 d± = 0.01, 

0 * 

II 

303.9, = 529.4, 

At = 

Tr_5/240. 


0.0774), 


Figure 5 depicts the contour plots of the spanwise perturbation vorticity (V 2 ) in 
plane j/q = 0.1123 at t = 3T, AT, • • • , 7T, where T is the so-called T-S period. It is 
quite clear that within this level, the flow scale is still pretty large, and only large 
scale lambda waves can be resolved. 

Figure 6 presents contour plots of spanwise vorticity produced by V 3 in the same 
plane and at the same time as Figure 5. Though this level is still not fine enough to 
catch all of the scales in the flow field, some finer scales are resolved. We find that, 
in the patch (D 3 ), more vortices are generated on this level and they are amplified 
when they travel downstream. This is at least qualitatively correct. 

The final results produced by % -f V 3 are described in Figures 7 and 8, showing 
clearly that more physical details can be found than in Figure 5. 

CONCLUDING REMARKS 

We have demonstrated the potential of multiscale simulation for solving fluid flow 
problems to greater resolution and with better efficiency than conventional fixed-scale 
methods provide. However, several important improvements need to be achieved: 
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e The ‘one-way’ refinement approach should be improved by ‘two-way’ grid pro¬ 
cessing so that the finer scale resolution more effectively influences the global 
coarser scales. This would be more in the spirit of a true multilevel algorithm. 

« The treatment of the artificial local-grid boundaries should be improved by 
other than homogene,ous Dirichlet conditions to achieve better conservation. 

9 The local source terms should somehow be improved to provide more accurate 
fine-scale features. 

* The intergrid dissipation scheme plays an important role in allowing the sim¬ 
ulation to retain relatively coarse resolution, but the particular choice of the 
weights here is somewhat ad hoc. We may need to find a more physically based 
rationale for determining these weights. 
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Figure 6. Contour plots of the spanwise vor- 
ticity produced by V 2 in plane = 0.1123 
at t = 3T, 4r, • • •, 7T (from top to bottom). 


Figure 7. Contour plots of the spanwise vor- 
ticity produced by in plane = 0.1123 
at t = 3T, 4T, ■ ■ ■,7T (from top to bottom). 


486 






















Figure 8. Contour plots of the span wise vor- 
ticity produced by in plane = 

0.1123 at t = 3r, iT,---,7T (from top to 
bottom). 


Figure 9. Contour plots of the spanwise vor- 
ticity produced by V 2 + V 3 in plane Zq = 0 d^t 
t — 3r, 4T, ■ • ■ ,7T (from top to bottom). 
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A NOTE ON SUBSTRUCTURING PRECONDITIONING FOR 
NONCONFORMING FINITE ELEMENT APPROXIMATIONS OF 
SEGOND ORDER ELLIPTIG PROBLEMS 


Serguei Maliassov* 


SUMMARY 

In this paper an algebraic substructuring preconditioner is considered for non- 
conforming finite element approximations of second order elliptic problems in 3D 
domains with a piecewise constant diffusion coefficient. Using a substructuring idea 
and a block Gauss elimination, part of the unknowns is eliminated and the Schur 
complement obtained is preconditioned by a spectrally equivalent very sparse matrix. 
In the case of quasiuniform tetrahedral mesh an appropriate algebraic multigrid solver 
can be used to solve the problem with this matrix. Explicit estimates of condition 
numbers and implementation algorithms are established for the constructed precon¬ 
ditioner. It is shown that the condition number of the preconditioned matrix does not 
depend on either the mesh step size or the jump of the coefficient. Finally, numerical 
experiments are presented to illustrate the theory being developed. 

1. INTRODUCTION 

Let fi be a convex bounded domain in IR^ with boundary 50. Consider an elliptic 
problem 

Vu) = / in 0, 

u = 0 on To, (1) 

1^ = 0 onFi, 

where fc(x) is a uniformly positive bounded function, /(x) G T^(0), Fo U Fi = 50, 
Fo n Fi = 0 , and Fq = Fq 7 ^ 0 . 

Note that an approach considered in this paper is valid also for the case of the 
Neumann problem, i.e. Fo = 0 , and it is not described here only for the sake of 
simplicity. 

Let the bilinear form a(-, •) be defined by 

a{u,v) — {k • Vn, Vn), G Vo(0) = {n G : n = 0 on Fq}, 

*Institute for Scientific Computation and Department of Mathematics, Texas A&M University, 
326 Teague Research Center, College Station, TX 77843-3404. e-mail: malyasov@isc.taTnn.edu 
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where (•, •) denotes the L^{Cl) inner product. Then the usual weak form of (1) for the 
solution u € Vo(f2) is 

a{u,v) = {f,v), Vu € Fo(n). (2) 

Let 7x be a regular partitioning of 0 into simplexes T with a mesh size h and let 
V/i(f2) be the Pi-nonconforming finite element space of functions v G L‘^{Vl) [1] such 
that v\t are linear for all' T £Tt and v are continuous at the barycenters of T G 7r 
and vanish at the barycenters of the boundary faces on Fq. Note that the space 14 (fl) 
is not a subspace of 

Define the bilinear form on 14(0) by 

ah{u,v) = {k ■'Vu,Vv)t, Vu,uG14(0), (3) 

TeTr 

where is the L'^{T) inner product, T G 74- Then the Pi-nonconforming finite 

element discretization of (1) is to find Uh G 14 such that 

af,{uk,v) = Vu G 14(0). (4) 


Once a basis {y?t(x)}^j for 14(0) is chosen, (4) leads to a system of linear algebraic 
equations. Write u(x) = unpi{x). Then (4) becomes 


N 

<fj) = (/, ipj), i = 1,..., iV, 

t=l 

or in matrix representation 

Au = f, (5) 

where Aji = ifj), fj = (/, >fj), ij = 

The first efficient solvers for nonconforming finite element approximations were 
proposed and investigated in [1] and [2]. Further developments can be found in [3], 
[4], and [5]. 

In this paper we will describe and analyze a method of constructing the precondi¬ 
tioner for (5) using an idea of algebraic substructuring (see [6] and [7]), which consists 
of the following main steps. 

First, we represent the matrix A from (5) in a 2 x 2 block form 

A _ ^11 ^12 

A21 A22 ’ 

where An : 1R^‘ —>■ IR^', i = 1,2, Ni + A ^2 = A^, in such a way that the block A 22 
is easily invertible. With the introduction of the Schur complement Au = /lii — 
^12-422^^21, the matrix A can be rewritten in the form 



^lll + 4^12^22^2421 4 i2 

421 422 


( 7 ) 
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Then, we reconstruct the directed graph of the Schur complement A\\ in such 
a way that the resulting matrix S has the same kernel, is still positive definite (or 
positive semidefinite if the matrix A is singular), and is spectrally equivalent to the 
matrix An, i.e., 

co(5'u, u) < (Aiiu, u) < ci{Su, u), Vu G IR^S 

with constants cq and Ci independent of the mesh step size h and the jump of the 
coefficient k{x). 

To precondition the matrix S we make the same steps. That is, the matrix S is 
represented in a 2 x 2 block form 

n _ Su Si2 

[ 521 522 ’ 

where Sa : ^ i = 1,2, A^n + N12 = A^i, in such a way that block 

S 22 is easily invertible, so that Schur complement Su = 5ii — 5i25^2^52i is easily 
computable. 

Finally, following the ideas in [8], [9], and [10], we construct matrix 5n spectrally 
equivalent to 5ii with constants 0 < do < di independent of the mesh size parameter 
h and the jump of the coefficient A:(x): 

do(5iiv, v) < (5iiv, v) < di(5iiv, v), Vv G IR.^“. 

Then the matrix 

5ll + 5 i25^2^521 512] . 4-14 4 

r. C ■' ^12^22 ^21 ^12 

021 022 

21 ^22 
is spectrally equivalent to the matrix A, i.e., 

ro(5u,u) < (Au,u) < ri(5u,u), Vu G IR^, 

where Vq = min{l;co} • min{l;do}, ri = max{l;ci} • max{l;di}. In the case of 
quasiuniform mesh and piecewise constant coefficient A:(x), an algebraic multigrid 
method (AMG) [11], [4], [9], [10] can be used to construct such a matrix 5ii. 

In other words the reconstruction of the directed graph of the matrix is equivalent 
to constructing the equivalent norm on finite dimensional space. An implementation 
of this approach depends on the structure of the graph of matrix A and, consequently, 
on the type of nonconforming finite element space 14. A detailed description of con¬ 
structing algebraic substructuring preconditioners for one concrete case of the Pi- 
nonconforming space 14 was given in [12], [13], and [14]. In all these papers authors 
defined the partitioning of the whole domain by subdividing it into topological par¬ 
allelepipeds and splitting each parallelepiped in turn into six tetrahedra. The present 
paper extends these results to the case of splitting each topological parallelepiped into 
five tetrahedra. 




( 8 ) 
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The explicit bounds of the spectrum of the preconditioned matrix are obtained 
with the help of the superelement analysis [12], [10], [7], [15]. 

The outline of the reminder of the paper is as follows. In Section 2 we consider 
a formulation of the model problem with piecewise coefficient A;(x) when Q is a, unit 
cube. Then, in Section 3 we develop an algebraic substructuring preconditioner for 
the resulting linear system and give an implementation algorithm. In Section 4 we 
outline the algebraic multigrid method we use to precondition the Schur complement 
obtained in Section 3. Finally, the results of the numerical experiments and some 
conclusions are given in Section 5 to illustrate the theory being presented. 

2. PROBLEM FORMULATION 


To explain our approach we consider the model case when 0 is a unit cube in IR^, 
the boundary conditions are uniform, and k{x) is a piecewise constant function. Note 
that an extension of the method for the case of 0 being a union of parallelepipeds is 
straightforward. 

Let Ch = be a partition of fi into uniform cubes with the length of the 

edge h = 1/n, where {xi,yj,Zk) is the right back upper corner of the cube 
Next, divide each cube into 5 tetrahedra as shown in Figure 1 and denote this 

partitioning of Cl into tetrahedra by Th. Note that we have two types of the partition¬ 
ing of the cubes into tetrahedra and the cube with one type of partitioning has 

all adjacent cubes of another type. Below we assume that function k{x) is a constant 
on each cube C ^ Ch- 



Figure 1 . Partition of cubes into tetrahedra. 


We introduce the set of barycenters of all faces of the tetrahedral partition of Cl and 
the set Qh of those barycenters not on Fq. The Crouzeix-Raviart Pi-non conforming 
finite element space 14 is defined by 


Vh = {ve L\n) : 


v\t G Pi{T), Vr G Th', V is continuous at the barycenters 
from Qh and vanishes at the barycenters of faces on Fo}. 

( 10 ) 


Let its dimension be N. Note that N th lOn^. 
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Now we define the bilinear form on Vh by 


ah{u,v) = ^2 / k{x)Vu-Vvdx, Vu,nGl4. (11) 

TeTh 

Thus the nonconforming discretization of the problem (1) is given by seeking 
Uh G Vh such that 

ah{uh,v) = {f,v), 'iveVh. (12) 

For any function Vh € 14 we denote by v G the corresponding vector of its degrees 
of freedom. 

Let (u,v)jv be a standard bilinear form defined on IR^ by (u,v)jv = 

Vu, u G 14- Then the discretization operator A : IR^ —+ IR^, which is symmetric and 
positive definite, is defined by 

(Au,v);v = a/i(u,u), u,veVh. (13) 

Similarly, we introduce the vector f by (/, u) = (f, v)Ar, V u G 14. Now, problem 
(12) can be rewritten in a matrix form 

Au = f. (14) 

For each cube C = G C/j, denote by the subspace of the restriction of 

the functions in 14 into C. For each v G , we indicate by Vc the corresponding 
vector. The dimension of is denoted by Nc- Obviously, for a cube without faces 
on Fo we have Nc = 16. 

The local stiffness matrix A^ on a cube C G C/i is given by 

(A^Uc,Vc)iv, = X] Vuh,VheV^. (15) 

Tcc 

Note that matrices A^ are positive definite when C* 0 Fo 7^ 0 and semidefinite oth¬ 
erwise. The global stiffness matrix is determined by assembling the local stiffness 
rn3/tric6s * 

{Au,v)m = (^^Uc,Vc)iv„ Vu,v G IR^. (16) 

CGCh 

3. ALGEBRAIC SUBSTRUCTURING PRECONDITIONER OVER A CUBE 

In this section we construct the algebraic substructuring preconditioner outlined 
in the Introduction. Toward the end of the section, we divide all unknowns in the 
system into two groups: 

1. The first group consists of 
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(a) one unknown per cube corresponding to the 1st face of those tetrahedra 
that are internal for each cube C € C/j (see Figure 2, face 1). 

(b) all unknowns corresponding to faces of the cubes in the partition Ck, with¬ 
out the faces on To (Figure 2, faces 2,3,, 13). 

2. The second group consists of the unknowns corresponding to the faces of the 
tetrahedra that are internal for each cube and that are not in the 1st group 
(these are unknowns on faces 14, 15, and 16 in Figure 2). 



Figure 2. Local enumeration of faces in a cube. 


The splitting of the space IR^ induces the presentation of the vectors = 
(v^jV^), where Vi 6 and V 2 € IR^^, and V 2 corresponds to the unknowns 

of the second group. Obviously, N 2 = 3n^ and JVi = JV — 3n^. Then the matrix A 
can be presented in the following block form: 


A = 


An 

A21 


Ai2 

. A22 ’ 


(17) 


where An : IR^' —> IR^* , i = 1,2. 

Note that the matrix A 22 is block diagonal and can be inverted locally (cube by 
cube). Thus, Schur complement An = An — Ai 2 A 22 ^A 2 i is easily computable. 

The local stiffness matrices on each cube also have the block form: 



An,c Ai2,c 
A21,c A22,c 


? 


( 18 ) 


where A 22 ,c are 3x3 matrices. 

An important fact which is established by direct computations is that the matrix 
An can be obtained by assembling over all cubes local matrices An,c = An,c — 
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Ai2,c^22,c‘^‘21,c- 

(illUi,Vi) = Y. ^kc{An,cUl,c,^l,c), VUi,Vi G IR^'. 
ceCh ^ 

Here Ui,c is a restriction of Ui into the nodes of the first group on the cube C G Ch, 
and for the cube C E Ch without faces on Fq we have dim Ui,c = 13. 

Let us consider a cube C that has no face on the boundary Fq and enumerate the 
faces Sj, j = 1 ,..., 16, as shown on Figure 2. Then the local matrices hj = L 2, 
of this cube have the following form: 

■ 9/2 -r^ 0 0 0' 

-V I r 9 -1 -1 ■ 

An,c= 0 / , A22,c=:: -1 9 -1 , (19) 

0 I ^[-1-19 

_ 0 / J 

■ -1/2 000 -1 -1 -1 0 0 0 0 0 O' 

Af2, = A2i,e= -1/2 000 0 0 0 -1 -1 -1 0 0 0, 

[ - 1/2 000 0 0 0 0 0 0 -1 -1 - 1 . 

where r = |^ 1 1 1 and / is 3 x 3 identical matrix. 

The local Schur complement matrix An,c for this cube has the form 

30 -7r^ -r^ -r^ -r^ 

-7r 7/ 0 0 0 

in,c=^- -r 0 T -R -R , (20) 

‘ -r 0 -R T -R 

-r 0 -R -R T 


where 



27 -8 -8 ■ 
-8 27 -8 , 

-8 -8 27 



1 1 1 
1 1 1 


1 1 1 


Along with the matrix An,c we introduce on each cube C ^Ch the 13 x 13 matrices 
Sc by 



and define x Ni matrix S by assembling over all cubes local matrices Sc- 

(S'Oi,Vi)= Y ¥^c(-5c11i,c,Vi,c), VUi,ViGIR^L 
C&Ch ^ 
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It is easy to see that the matrices Au,c and Sc have the same kernel, i.e., keiAn^c = 
ker5c. 

We now consider an eigenvalue problem for fj, ^ 0: 

^ 11 ,cU = fiScU, u 0, u G IR^^. (22) 


Direct calculations show that the eigenvalues of this problem belong to the interval 

f^max\t where firnin ~ 1/f and fimax ~ I- 

Defining a new Nc x Nc matrix on each cube 


Sc + Ai2,cA22,c ^21,c Ai2,c 1 

^21,c -422,c 

we define the symmetric positive-definite N x N matrix B by 

(Bu,v)= y; (B'^uov,), 

CSCh 



(23) 

(24) 


where v, u G IR^, and Uc and Vc are their respective restrictions on the cube C. 

To estimate the condition number of the matrix B~^ A we use so called superele¬ 
ment analysis (see [16], [9], [17], [7]). Namely, it is easy to show the following inequal¬ 
ities: 


(Au,u) 

max 7 —- 7 = max 

(Bu,u)?to (5u,u) (Bu,u)^0 


Y: ( 4 ^Uc,Uc) 
C^Ch _ 

E ( 5 ^Uc,Uc) 
C€Ch 


< 


max 

cec^ 

(bCUc,Uc)¥Q 


(A'^UcUc) 

(5^Uc,Uc) 


and 


{Au,u) 

mm - -r 

(Bu,u)5i0 [Bu,\l) 


min 

(Bu,u)?tO 


E (A''Ue,U,) 

ceCh _ 

E {B<^Uc,Uc) 


> mm 

- cec^ 

( bCUc . Uc ) 7!0 


( 2 l°Ue,Ue) 

(B®u„ uj' 


(25) 


(26) 


From the inequalities (25) and (26) we see that to estimate the condition number of 
B~^A, it is sufficient to consider the local eigenvalue problems for /ic ^ 0 on each 
cube: 

A'^Uc = ficB^Uc, Uc ^ 0 , Uc G IR^". 

From (22) direct calculations show that the eigenvalues Hc are within the interval 
[1/7,1]. Then the inequalities (25) and (26) yield: 

Proposition 1. The eigenvalues of the problem 4u = \B\i, u ^ 0, belong to the 
interval W11 and the condition number is thus estimated by cond{B~^A) < 7. 

We stress that the condition number of the matrix B~^A is bounded by a constant 
independent of the mesh step size h and the jump of the coefficient k{x.). 

Let us take the matrix B from (24) as a preconditioner for the matrix A. In the 
terms of the group partitioning introduced above it has the following block form 

^ _ S + 4x2^22^4.21 4 i2 
421 422 


(27) 
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As we noted earlier, the matrix A 22 is block-diagonal and can be inverted locally on 
cubes. So we concentrate on the linear system 

Sw = G, w,G€lR^\ (28) 


The matrix S also can be represented in the block form 


S = 


Sti 

S 21 



(29) 


where the block S 22 corresponds to the nodes from the subgroup (b) of the first group, 
which are on the faces of cubes C E Ch- From the definition of S, it can be seen that 
the matrix S 22 is diagonal. In the above partitioning, we present w and G in (29) in 
the form 


w = 



G = 


Gi 

G 2 ’ 


(30) 


where the dimension of vectors Wi and Gi is obviously equal to M = dim Wi = 
and dim W 2 = A^i — n^. Then, after elimination of the second group of unknowns 
W 2 = 822(^2 — 5'2iWi), we get the system of linear equations 


(^II — 5'i2>S'22^«S'2i)Wi — Gi — iS'i2>S'22^G2 = Gi, (31) 


where the vector Wi and the block correspond to the unknowns from the subgroup 
(a) of the first group, which have only one unknown per cube. 

Thus, if we define as above the Schur complement of matrix 5 by — 

«S'i25'^^5'2i, matrix B can be presented in the form 


B = 


Sii + 512 
^21 




22 “^21 Si2 
S22 
A21 


+ -4122422^^21 


4.12 

422 


(32) 


where matrix A 22 is block diagonal and S 22 is diagonal and can be inverted locally 
cube-by-cube. Again, we have to stress that the condition number of the rnatrix B~^A 
is bounded by the constant independent of the mesh step size h and the jump of the 
coefficient A:(x). The matrix B can be referred to as a three-level preconditioner. 

It is easy to see that the Schur complement is a “7-point-scheme” matrix. In 
the next section we consider the solution techniques for problem (31) with the matrix 

Gii. 


4. MULTILEVEL PRECONDITIONER OVER A CUBE 

While the preconditioner B has good properties, it is still not economical to invert 
it because the entries of the matrix .Sii depend on the jump of the coefficients. In 
this section we propose a preconditioner for the matrix 5ii provided that additional 
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assumptions on the behavior of the function k{x) are met and show that for this 
modification we can use any well-known multilevel procedure. 

Assumption (Al). Suppose that unit cube 0 can be represented as a union of a 
certain number m of pairwise disjoint cubes Gi, i = 1,... ,m, with the size of edge H 
(H > 2h) in such a way that in each cube Gi the function k(x) is a positive constant. 

_ m _ 

In other words, we set 0 = U Gi and k{x) = consti > 0, x G Gi, i = 1,... ,m. 

j=i 

Now define on an auxiliary parallelepipedal mesh Tc with vertices in the centers 
of cubes G Tc and in the centers of the boundary faces fl dVl. Let us 

consider a standard partitioning of Tc into tetrahedra Th and enumerate the nodes 
of this mesh in accordance with the enumeration of the cubes of Tc- 

Then define the piecewise constant function ^(x) to be constant on each cube 
e fc by 

k{x)= mm ^ x e (33) 

and consider the boundary value problem 

-V-^k Vu) = g in 0, u = 0 on Fq. (34) 

Denote by Uh a usual (conforming) finite element space of all continuous piecewise 
linear functions on Th that vanish at the nodes of Fq. Note that dim Uh = M. And, 
finally, define the symmetric positive definite matrix C by 

(C'u,v)jv/ = J kVu-Vvdx 'iu,v£Uh, (35) 

n 

where u, v are the vectors of degrees of freedom corresponding to the functions u and 
V, respectively. 

Consider an eigenvalue problem 

5iiu = pCu, u 7 ^ 0, u G IR^. (36) 

The following statement plays a very important role in all further arguments [15]. It 
can be established by straightforward computations. 

Proposition 2. The eigenvalues of the problem (36) belong to the interval 

[ 1 / 2 , 1 ]. 

Now instead of the matrix (32) we define new matrix B by 

C + 5 ' i 25 '^ 2^521 ‘ 5 ' i 2 1 , 4 A-iA A 

Q C T ^12-^22 ^21 ^12 

021 .522 

A 21 A.22 

Then we can formulate the following theorem. 
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Theorem 1. The matrix B defined in (37) with the block C defined in (35) is 
spectrally equivalent to the matrix A and cond(5~^y4) < 14. 

Thus, we have constructed a spectrally equivalent sparse preconditioner for the 
Schur complement after the elimination of almost 90% of the original unknowns. We 
note here that matrices A 22 and S 22 are block diagonal and, with B as a preconditioner 
for the matrix A, we have to develop procedure for solving the linear system of 
equataions 

Cu = G, u € IR^. (38) 

We have to stress that the function A’(x) is piecewise constant. Thus, any multi¬ 
level procedure which works well for such problems (34) can be used. 

We apply the preconditioned conjugate gradient method to solve the problem 
(13) with the matrix B from (37) as a preconditioner for the matrix A and use the 
multilevel domain decomposition method (MGDD) [9], [10], [15] to solve the problem 
(38) with matrix C\ we establish the following results. 

Statement 1. If we use the MGDD method to solve problem (38) with the matrix 
C, then the condition number cond(.S“^yl) does not depend on mesh size h and the 
jump of the coefficient k{x). 

Statement 2. The number of operations for solving the system Au = f by the 
preconditioned conjugate gradient method with preconditioner B and with accuracy e 
in the sense 

+ ^ - U*||^ < £|[U° - U*[|^, 

is estimated by C • N - In-, where u* = A"T, u° G IR^, and C does not depend on 
N and jump of the coefficient k{x). 

5. RESULTS OF THE NUMERICAL EXPERIMENTS 


In this section the preconditioner being considered is tested on the model problem 

-V-(^(x)Vu) = /, inO = [0,lf, 

u = 0, on dD. 


In the numerical experiments presented we use the preconditioner B in the form 
(37). In this case by the Theorem 1 Cond B~^A < 14. The problem with matrix C 
is solved by the multilevel domain decomposition method, as described in [15]. 

The domain is divided into M = cubes (n in each direction) and each cube 
is partitioned into 5 tetrahedra. The dimension of the original algebraic system 
is = lOn^ — 6n^. The right hand side is generated randomly, and the accuracy 
parameter is taken as £ = 10”®. The condition numbers of the preconditioned matrices 
B~^A are calculated by the relation between the conjugate gradient and Lanczos 
algorithms. The coefficient k{x) is piecewise constant and is defined to be 


k{x,y,z) = 


k, {x, y, z) G [0.5,1] x [0.5,1] x [0.5,1] 

l, elsewhere 


(39) 
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The results are summarized in Table 1, where niter and cond denote the iteration 
number and condition riumber, respectively. All experiments are carried out on a 
Sun workstation. It takes approximately 25 minutes to solve the problem of the 
largest dimension = 1 235 000. 

From Table 1 we see that the condition number does not depend on either the 
step mesh size h or the jump of the coefficient A:(x). 


Table 1. Solving C by MGDD method 



20 X 20 X 20 

N = 77 600 

30 X 30 X 30 

N = 264 600 

40 X 40 X 40 

N = 630 400 

50 X 50 X 50 

A^ = 1 235 000 

k 

T^iter 

cond 

^iter 

cond 

^iter 

cond 

Ujter 

cond 

1 

14 

5.32 

14 

5.30 

14 

5.29 

14 

5.28 

10 

17 

6.59 

17 

6.53 

16 

6.37 

16 

6.29 

100 

17 

6.94 

17 

6.90 

16 

6.89 

16 

6.88 

1000 

17 

6.98 

16 

6.96 

16 

6.95 

16 

6.93 

10^ 

16 

6.98 

16 

6.96 

16 

5.95 

16 

6.94 

0.1 

16 

5.97 

16 

5.96 

16 

5.96 

15 

5.94 

0.01 

16 

6.02 

16 

6.02 

16 

6.00 

15 

5.97 

0.001 

16 

6.02 

16 

6.01 

16 

6.00 

15 

5.97 

10-^ 

16 

6.02 

16 

6.01 

16 

6.00 

15 

5.97 
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CONVERGENCE OF A SUBSTRUCTURING METHOD 
WITH LAGRANGE MULTIPLIERS^ 


Jan Mandel and Radek Tezaur 
Center for Computational Mathematics 
University of Colorado at Denver 
Denver, CO 


SUMMARY 

We analyze the convergence of a substructuring iterative method with Lagrange 
multipliers, proposed recently by Farhat and Roux. The method decomposes 
finite element discretization of an elliptic boundary value problem into Neumann 
problems on the subdomains and a coarse problem for the subdomain nullspace 
components. For linear conforming elements and preconditioning by the Dirichlet 
problems on the subdomains, we prove the asymptotic bound on the condition number 
(7(1 + log(iI//i))'>', 7 = 2 or 3, where h is the characteristic element size and H is the 
subdomain size. 


INTRODUCTION 


We analyze the convergence of a substructuring method with Lagrange multipliers, 
proposed by Farhat and Roux [11] under the name Finite Element Tearing and 
Intercotmecting (FETI) method. The main idea of the FETI iiiethod is to decompose 
the problem domain into nonoverlapping subdomains and to enforce continuity on 
subdomain interfaces by Lagrange multipliers. Eliminating the subdomain variables 
yields a dual problem for the Lagrange multipliers, which is solved by preconditioned 
conjugate gradients. This idea is related to the fictitious domain method where the 
Lagrange multipliers enforce boundary conditions as in Dinh et al. [5]. 

Elimination of the subdomain variables is implemented by solving Neumann 
problems on all subdomains in every iteration, which can be done completely in 
parallel. However, the subdomain problems are singular, so a srpall auxiliary problem 
for the nullspace components of the subdomain solutions needs to be solved in every 
iteration. This is an added complication, but also a blessing. Earhat, Mandel, and 
Roux [10] have shown numerically and have proved for the EETI method without 
preconditioning that the auxiliary problem plays the role of a coarse problem, namely, 
it causes the condition number to be bounded independently of the number of 

'^This research was supported by the National Science Foundation under grants ASC-9217394 
and ASC-9121431. This paper has been submitted for journal publication elsewhere. 
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subdomains. The method was further extended to time-dependent problems, which 
lack the naturally occurring coarse problem, by Farhat. Chen, and Mandel [9]. 

In this paper, we show that the condition number of the preconditioned 
FETI method is bounded independently of the number of subdomains and 
polylogarithmically in terms of subdomain size, as is the case for other optimal 
nonoverlapping domain decomposition methods [3. 6. S. 16. 17]. We refer to [10] 
for numerical results that confirm the theory and for parallel implementation and 
performance. 

The FETI method is in a sense dual to the Xeumann-Neumann method 
with a coarse problem, developed by Mandel under the name Balancing Domain 
Decomposition [15] based on an earlier method of de Roeck and LeTallec [4]. A 
modified method was analyzed by Dryja and M’idlund [S]. 

Analysis of domain decomposition methods typically proceeds by demonstrating 
spectral equivalence of the quadratic form that defines the problem in a variational 
setting and the quadratic form that defines the preconditioner, often by way of the 
P. L. Lions lemma [1. 6. 7. 14]. Since the preconditioner in the FETI method is quite 
complicated and is not defined in terms of a quadratic form, we proceed differentlj^ and 
find a bound on the norm of the product of the system operator and the preconditioner 
to bound the maximal eigenvalue, as well as a bound on the inverse to bound the 
minimal eigenvalue. Related analyses were previously done for methods without 
crosspoints between the subdomains, or done formally in functional spaces (cf., for 
example, Glowinski and Wheeler [12]). In this paper, we present a complete analysis 
in terms of upper and lower bounds on the preconditioned operator for decompositions 
with crosspoints in 2D and edges and crosspoints in 3D. 

EORMULATION OE THE METHOD 

In this section, we briefly review formulation of the FETI method according to [10], 
where one can find more details about the algorithmic side. At the same time, we 
introduce the spaces and operators that will be used in our analysis. 

We consider iterative solution of a system of linear equations Lx = h arising from 
a finite element discretization of an elliptic boundar}' value problem on a bounded 
domain D, which is decomposed into nonoverlapping subdomains D,:, i = 1,... 

The matrix A is assumed to be symmetric and positive definite. Let 

IW = 14(c)D,) (1) 

be the space of local vectors of degrees of freedom associated with the boundary of 
D,:, and let 

5'' = K(U'9a') (2) 

i=l 

be the space of global vectors of degrees of freedom associated with all subdomain 
boundaries. The correspondence of the local and global vectors of degrees of freedom 
is given by zero-one matrices W : W: Y. 
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We find it convenient to identify vectors of degrees of freedom, which are in some 
spaces IR^, with the associated finite element functions. Operators between the spaces 
are represented as matrices, and we frequently commit an abuse of notations by using 
matrices and operators interchangeably. The P inner product is denoted by (•, •) on 
all spaces. The associated norm is ||u|p = {u,u). The transpose of a matrix M is 
denoted by M'. 

After elimination of the interior degrees of freedom in all subdomains 0, , we obtain 
the reduced system of linear equations for the vectors Wi G Wi of degrees of freedom 
on subdomain boundaries, which we write in subassembly form as 


f^NiSiWi = / 

t = l 

Us 

(3) 

^BiWi = 0 

i=l 

(4) 


Here, Si are the Schur complements of the subdomain stiffness matrices obtained 
by elimination of the interior degrees of freedom, and Bi are matrices with entries 
0,1,—1 such that (4) expresses the continuity of the solution between subdomains, 
that is, the requirement that the values of degrees of freedom common to more than 
one subdomain coincide. 

To describe the method in a concise form, we need to define the following spaces. 
IT is a space of all boundary degrees of freedom on all subdomains: 

IT = (g) Wi (5) 

X is a space of vectors with entries corresponding to pairs of degrees of freedom on 
the interfaces where we enforce continuity: 

ATc (g) (6) 


Denote the block matrix 


H:lT-.X = (Hi,...,H„J (7) 

and the space of Lagrange multipliers 

U = Range B. (8) 

These are the details we need for the purpose of describing the method. A more 
specific description of B will be given in the next section. Finally, denote the 
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symmetric block diagonal matrix 


/ .S'! 0 

0 52 0 


S :W s = 


\ 0 


0 \ 


• ° 

0 J 


(9) 


The problem (3) and (4) can now be written as the minimization of total 
subdomain energy subject to the continuity condition: 

+ (/,ro) ^ min, subject to rw G Vb, Bw = 0. (10) 

Writing the Lagrangean of this minimization problem 

^w,\) = ^{Siu,w) + {f,w) + {\,Bw), 10 eW, X eU, 

we solve the dual problem 


max inf C(w,X) = maxC(A) 
\eu wew ^ \eu ^ ' 


( 11 ) 


By a direct computation, 


C(A) 


—oo if (/, w) + (A, Bw) ^ 0 for some w G Ker 5, 
-X{S+if - B'X)J - B'X) otherwise, 


( 12 ) 


where 5"^ : W W is any pseudoinverse of 5, i.e., an operator such that w = B'g 
solves Sw = g if g 1. Ker S. It is easy to see from (12) that the choice of 5+ does 
not change the value of C. Without loss of generality, assume that 5+ is given by the 
spectral decomposition 

= (13) 

t>0 ^ 

where 

S = ^tvtv), Svi = tvt, v)vt = l. (14) 


The dual problem (11) is equivalent to maximizing C( A) on the admissible set 


{A G K I C(A) > -oo}. 

Define the space of admissible increments 

V = {Ai — A 2 I Ai G A., X 2 G -4} 

= {g E U \{g,Bw) = 0 Vrc G Ker 5}. 


(15) 
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At the maximum of C(A), A G -4, the derivative of C is zero in all directions in V: 

DC(X; fx) = 0 V/x€V. 

By a straightforward computation, this becomes 

A G A , (-BS-^B'X + BS+f, n) =0, V/i G y. (16) 

In order to express (16) as a linear equation in the space V, let By '■ U V be the 
projection onto V orthogonal in the I 2 inner product Then for // G V, 

{-BS+B'X + BS+f, li) = {~BS+B'X + BS+f, Py/r) = {Pv{-BS+B'X + BS+f),^l) 

since By is orthogonal, so By = Bf. Therefore, the dual problem (11) is equivalent 
to the linear equation in V for the unknown fj,: 

^ieV, By{-BS+B'{fi + Xo) + BS+f)=0, (17) 

where Aq is an arbitrary starting feasible solution, i.e., A) € >4. 

The FETI method is the method of preconditioned conjugate gradients in the 
space V applied to the linear equation (17). The linear part of the operator in (17) 
is ByF, where 

F = BS+B'. (18) 

We consider the preconditioner PyM, where 

M = A'SA, A = ip'. (19) 

That is, in each iteration of the preconditioned conjugate gradients algorithm, 
z = By Mr is evaluated as an approximate solution of the residual equation ByFz = r. 
The prqconditipner (19) was proposed in [10]. Note that the evaluation of the 
matrix-vector product Su can be implemented by solving a Dirichlet problem in each 
subdomain; therefore it is called the Dirichlet preconditioner in [10]. 

ANALYSIS 


A well known bound on the reduction of the error in k iterations of the method 
of preconditioned conjugate gradients in the norm ||le||| = {ByFe,eYB on V is [13] 


where /c is the condition number 


\\/k + 1 / ’ 
Xma.x{ By F By M\y) 


Xmin{PvFByM\y) 

and Amax and Amin are the maximum and minimum eigenvalues of operators on V. 


( 20 ) 
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Abstract Framework 


The main idea of our convergence analysis is summarized in the following lemma, 
which we will apply to F and M from (IS) and (19). 

Lemma 1 Let L be a finite dimensional linear space with the inner product (•,•). 
Let F he a subspace of U. |1 ■ 1|\- be a norm on F induced by an inner product, and 
the dual norm be defined by ||!’||r' = sup-.g^- {v. c)/||r||v. Let Py : U F be the (•, •) 
orthogonal projection onto F. and F. ^L : f* —> F linear operators symmetric on V, 

(A, FA) = (A. FA) VA.AeF 
{FMv) = {v.Mv) yv.be W 

and such that 

callAllf., <(A.FA)<c2||A|lf, VA G F 
CsiK’H'r < (tud/c) < C4||i’||f Vr G F 

with constants ci.C 2 ,C 3 .C 4 > 0. Then 

_ '^max [PyMPvF) ^ C 2 C 4 

X^UPvMPvF) - CiCg- 

Proof. Since A € V, we can replace in (21) F by PyF. From (21), the operator- 
norm of the mapping PyF : F —> V and its inverse satisfies 

\\PvF\\y,^y < C 2 , \\{PyFr^\\y^y, < -. (24) 

Similarly, (22) implies 

\\PyM\\y^y, < C 4 , \\{PyM)-^\\y,^y < (25) 

C3 

Consequently, 

Amax(T’yd/FyF) < \\PvMPyF\\vi^yi < C 2 C 4 

and 

XraU{PvF)-\PyMr^) < \\{PyFr\PvM)-^)\\y,^v. < -, 

C1C3 

which gives (23). D 

The rest of this paper is concerned with estimating the condition number k 
from (23). We will specify a suitable norm || • \\v and estimate the constants in 
Lemma 1 for the finite element problem below. 


( 21 ) 

( 22 ) 

(23) 
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Assumptions 


We need more specific assumptions in order to be able to prove a bound on the 
condition number k. So, we are solving the boundary value problem 

•Au = g in f2, n = 0 on dfl 

where 



with a{x) a measurable function such that 0 < cto < a{x) < ai a.e. in fi. 

The domain is assumed to be divided into nonoverlapping subdomains ff,-, 
i = which can be generated from a reference domain (square or cube) O 

of unit diameter as fl,- = Fi{Cti) by mappings Fi, which are assumed to satisfy 

||5Fi|| < CH, \\dF-^\ < CH-^ 

with the Jacobian dFi and the Euclidean matrix norm ||.||. That is, the 
subdomains are shape-regular and have a diameter of 0{H). 

Assume that T4(f2) is a conforming PI or Ql finite element space on a triangulation 
of fl, which satisfies the standard regularity and inverse assumptions. Denote by h 
the characteristic element size. Each subdomain D,- is assumed to be a union of some 
of the elements, and all functions in f 4 (D) are zero on dfl. 

In particular, the degrees of freedom are values at nodes of the triangulation. We 
assume that B is defined as follows. For a pair of degrees of freedom Wr{xa) on dflr 
and Ws{xa) on dflg, such that the node x^ does not belong to any other subdomain, 
let 

{Bw)rs{Xa) = Crrs{Wr{Xa) - Ws{Xa)), (26) 

where cr^s = 1 or cr^s = — 1 . 

When node xp belongs to more than two subdomains dD,i,i = 5 i,S 2 ,... we 
assume that {Bw)rs{xp) is defined so that B is full rank and so that the coefficients 
are ±1 and determined uniquely by the indices (si, S 2 ,..., 5 „^). For example, 

{Bw)k,k+i{xp) = {-lYws,{xp)-{-lfws^^,{xp), k = l,..,np-l. (27) 

For an example of the definition of the values of B from (27) with (si, S 2 , 53 ) = (1, 3,2) 
in 2D around a crosspoint, see Fig. 1. 

Remark 2 The essential property here is that there are no redundant constraints in 
enforcing the continuity of the solution at the nodes where more than two subdomains 
meet and that the constraints do not change along the edges (in 3D). Only the improved 
estimate in statement 3 of Lemma 8 will require the specific definition (27). 
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“' 3 (^/ 3 ) m(xy) 


O3 

Figure 1; Definition of B 
Discrete Norm Bounds 


The key to our analysis is a proper choice of norms. We equip the space W with 
the seminorm and the norm 


^di/2.anp 


\\w\\\v - \w\w + 77 ^ ll^j||o,9n,) 


and the space V with the norm || ■ ||u and the dual norm || • ||y/ 


IK’lk = ll^4r 


lju|lr/ = sup V e V. 

vev n r 


For the definition and properties of the Sobolev seminorms | • see, e.g., [18]. The 
space D is identified.with some space IR". We use the P inner product (•, •) as duality 
pairing. 

In the following, we use a ^ b to indicate that ca < h < Ca with some positive 
generic constants c and C independent of the characteristic mesh size h and the 
subdomain diameter H. First we need to relate our discrete norm to a Sobolev norm 
and to establish equivalence of the norm and seminorm on the complement of the 
kernel of S. 


Lemma 3 ju'lfj. ~ {w.Sw). ic G IF. 
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Proof. The lemma follows from the standard result [2, 19] 
by summation over all subdomains 0,- and using (28). Q 


Lemma 4 \'w\w ~ ||^||w, w G W, w T Ker S. 

Proof. From the equivalence of the norm and seminorm on the factorspace modulo 
constants [18] or from the Poincare inequality, and scaling from a reference domain 
to subdomain Oi, 

for all Wi if dfli contains a part of dfl, and for all Wi such that fg^. Wi = 0 otherwise. 
The lemma follows by summation over the subdomains and from (28). □ 


We also need the equivalence of the norm ||A?;||vv and the seminorm |Au|vk- 
Lemma 5 ~ ||Au]]w, v ^ V. 

Proof. Let v ^ V. Since A = |5', by definition of V, we have (Av^w) = 0 Vrw G 
Ker S or Av T Ker S, which yields the result using Lemma 4. □ 

Our norm on V was chosen so that the preconditioner is coercive and bounded, 
i.e., so that (22) holds with Ci and C 2 independent of H and h. This is shown in the 
following lemma. 

Lemma 6 (n,Mu) fa |ln||y, Vn G V, 

Proof. For u G K, by definition of the preconditioner M, Lemma 3 and Lemma 5, 

{v,Mv) = {vjA'SAv) = {Av,SAv) ~ ||n||y 

Q 


The following lemmas lead to estimates of coercivity and ellipticity of F. We 
first summarize some well known results and inequalities in a form suitable for our 
purposes. 

Lemma 7 Let G be a vertex, edge, or face (if d = 3) of subdomain fl,-. A face is 
understood not to contain adjacent edges, and an edge does not contain its endpoints. 
For z G Vh{drti), define w G 14(50,) by w{x) = z(x) on all nodes x G G; w{x) = 0 
on all other nodes of dD,i. Then, 
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where 


/? = 1 if d = 2 and G is a vertex, or d = 3 and G is an edge or a vertex 
(3 = 2 if d = 2 and G is an edge, or d = 3 and G is a face. 


Proof. The inequality for d = 2 was proved in [16, 17]. The case when d = 3 follows 
from Lemmas 4.1 and 4.2 in [3] if G is an edge or a vertex and Lemma 4.3 in [3] if G 
is a face (cf. [ 6 ]).D 


Lemma 8 It holds that 

inf \fh\\l < C (1 + losiH/h)r\\ABtof,y, w € If/ 

W^vv 

Bw~Bw 

where a = 1, and. a == 0 in the following special cases: 

1. BA = /, which means that there are no nodes shared by more than two 
subdomains. 

2. d = 2 and the matrix A has the following property: If w ^ Range A, x is a 
crosspoint (node shared by more than two subdomains), and Wi{x) = ‘Wi{y) for 
all i such that x G dUi and all nodes y that are adjacent to x on dQi, then 
Wi{x) = 0 for all i such that x € dlli. 

3. d = 2, B is defined by (26) and (27), and all nodes in the triangulation belong 
to either one, two, or an odd number of subdomains. 

Proof. Let us first prove that in the general case we obtain o; < 1. Let w £W and 
u = Bw throughout this proof. From the fact that BA(BA)~^u = u, and by the 
triangle inequality, 

inf ||rc||w < < ||>lu||w + ||A(/ — (^BA)~^)u\\w- (30) 

w^W 

Bw=u 

Denote ^ = A{I — {BA)~^)u. From the definition of B in (26), 2 is zero at all nodes 
that belong to at most two subdomains. The remaining nodes lie on crosspoints or 
edges (in the 3D case) of subdomains. From the definition of B, at every such node 
X, Zi[x) is a linear combination of the entries of Au that correspond to the same node 
x, and the coefficients of the linear combinations are bounded only in terms of the 
number of subdomains to which the node belongs. Using Lemma 7 for the crosspoints 
of subdomains, we obtain for the 2D case that 

\\A(I.-(BA)-^)u\\l,<C y; ((AuUx)f <C(lA\oi(Hlh))\\Au\\l,. Cil) 

X crosspoint, iGSQj 
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In the 3D case, the argument for subdomain crosspoints is the same. In addition, we 
note that the coefficients of the linear combination do not change along a subdomain 
edge, so it remains to apply Lemma 7 on every edge. 

Let us now turn to the special cases that give o: = 0. 

If BA = 7, we choose w = Au in the following and get 

inf^ ||di||pv < as B{Au) = u, 

Bw=u 

which proves the special case 1. 

Now we prove special case 2. From the definition of the 77^/^ norm [18] and the 
fact that Au is a piecewise linear function, it follows that 

\\Au\\w > \Ml/ 2 ,dQi > ((Au)i(a;) - {Au)i{y)f . (32) 

X crosspoint, xSSflj- 
y adjacent to x,y66f2j 

For any crosspoint x, it follows from the assumption that for every w G Range A, 

{wi{x) - Wi{y)f = 0 Y = 0- 

i,dQiSx 

y adjacent to x, y^dfl^ 

Consequently, by compactness, and since there are only finitely many different 
numbers of subdomains sharing a crosspoint, 

Y < C* Y - Wi{y)f , VtD e Range A. 

i,dQi3x , 

y adjacent to x, y^dCl^ 

By summation over all crosspoints x and using (31) and (32), we get 

\\A{I - {BA)-^)ufw < C'llAuil'^, 

which concludes the proof of this case. 

In order to prove case 3, we verify the assumptions of case 2. We formulate 
only the proof for a crosspoint shared by three subdomains (Fig. 1). The proof 
is similar for a different odd number of subdomains. Let w G Range A. Since 
w\{xp) — w\{xa) = 0 and w\{x^) - W\{x 5 ) = 0, we have ioi(a:a) = ioi(a;5)- Simi¬ 
larly, we obtain W 2 {xa) = W 2 {x~,) and W 3 {xs) = W 3 {x~f). Now w G Range A implies 
tili(3::a) = —W 2 {xo,), W 2 {x^) = —W 3 {x^), and W3{xs) = —wi{xs), which can be satisfied 
only if wi{xa) = wi{xs) = ... = 0. 0 

Remark 9 In general, the exponent a = 1 in Lemma 8 cannot be improved. To see 
that, let us consider the configuration with values of u and Au in the neighborhood 
of a crosspoint as in Fig. 2; these values violate the assumptions of special case 2. 
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Figure 2: Counterexample. 

Extending the values of u in Fig. 2 to decay as \o^[tlH), 7 < 1/2, 
distance from the crosspoint, we obtain a function u ^ U such that 

i|Au||iy ~ C, ||w||fl'i/ 2 (aninan 2 ) ~ I 

If u = Bw, then on dfl-i C (9122, ii = W 2 — wi, we obtain 

I^l//i/2{anin9n2) - K^ilffi/2(anin9n2) + l^2|Hi/2{aninan2) 

< K«i|i^i/2(ani) + |'^2ij/i/2(an2) 
so miBw=u > C{-i)\\oghl!{[* for all 7 < 1 / 2 . 


Lemma 10 Let A G C. Then for all w G W, there is a w G 
ABw ± Ker S and 


{\,BwY 

Ibllu- 


<C{l^\ogHlhf 


(A, BwY 
\ABw\\lr 


Proof. Let u; G FF be arbitrary, and put w = w + z where ^ G 
A-G V, we have 

(A, Bw) = (A, Bw). 

We would like to have ABib ± Ker S, which can be also written as 

{Bz,Bz) =-{Bw,Bz) yzeKexS. 


where t is the 


W such that 


Ker S. Since 
(33) 
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The bilinear form is an inner product on the factorspace Ker 5'/(Ker S n 

Ker B), so by Riesz representation theorem we may conclude that there exists 
€ Ker S satisfying \\Bz\\ < ||5r«||. 

Now, from the definition of B and the norm in W, we obtain 

^ \\Bw\\^ < C\\w\\^ < CH\\w\\l^. 

Also, since ^ G Ker 5, it is constant on each 50,-, and we have the following by 
Lemma 7 

\\ABz\\^^ < C/H\\Bzf{l + logH/hf. 

Together this yields 

\\ABz\\l^, < 0(1 + \ogH/h)^\\wf^. 

By the definition of A and B, {ABw)i on 50,- U dflj is a linear combination (with 
bounded coefficients) of (a bounded number of) Wk from all d^k adjacent to d^iDd^j. 
From Lemma 7, 


||ARro||H^ < ^^(1 + log(i7//i))||u;||^, Vu; G W. 

Finally, summarizing, 

||A-Bi7||w ^ IIARioIIpv T ||A5z||^ ^ C{\ T log 77//i.)||'Uj||^. 

From this and (33), the result follows. □ 

We have now everything ready to prove the estimate (21). 

Lemma 11 c(l + log(i7//i))“"||A||^, < (A,F’A) < (7(1 + log(i7//i))2||A||^,, VA G V, 
with a defined in Lemma 8. 

Proof. From the spectral decomposition (14), define Then 

5+ = and for A G K, 

(A, FA) = {S+B'X,B'X) = {S-^^^B'X,S-^'^B'X) 

I, c-i/2 o/ mi 2 {S-^^^B'X, xf {B'X, S-^l^xY 

= IIS' ^'^B'Xf = sup ^- II II,’ = sup . J 

x€W \\x\\^ xew, x=xi+x2 Fi + .T2 r 

xj €Ker S X2XKer 5 

(F'A, 5 - 1 / 23 : 2 )" 

= ®iip -n—M2- 

X2eW, X2±KeT S lF2|r 

since S~^^'^xi = 0 and ||a:l|2 = ||a:i||2 + 11x211". Now write any u; G W as 
w = wi + W 2 , wi e Ker 5, W 2 = S~^^'^X 2 -L Ker 5. 

From the definition of K in (15), A G K implies that 

{B'X,wi) = 0. 
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Since 


11*^211^ = (.!’2--^’2) = {iL'2.Sw2) ^ |u.’2|ir ~ ||l02||ty 
from Lemma 3 and Lemma 4. it follows that 


(A, FA) 


{B'x,w2y 

= sup _ - --—- Si sup 

< it'oCn. u'2-LKer5 \^^’2 


(A, Bioy 

IKHIfr 


Lemma 8 shows that 


sup 

u'evu 


{x,Bwy 


IV 


2 

U' 


> 


(A.Fir)2 ^ 1 {X,Bioy 

3infs.=Bu.||i'|lu- ~ C(l + logF/A)“ WABiv^^y 

1 {X.Bwy 

C{1+log H/h)- ||AFtr||2.- 

.4j5u’±Ker S 


Lemma 10 yields an upper bound 


sup 

wew 


{X.Bi 


w 


|2 

hr 


< C{1+ \og H/h)^ sup 

ti'SH' 

-ASu'XKer 


{X.Bwf 

\AB^v\\lr 


Finally, by definition of the norm || • ||r/. 


sup 

u' 6 "' 

i4Sti’±Ker 5 


{X,Bw) _ (A,u) 

ABiv\\,v 


A||v^/ 


since B spans V. □ 


Condition Number Estimate 


The final result now follows from the abstract estimate in Lemma 1 with the 
assumptions verified by Lemma 6 and Lemma 11. 

Theorem 12 The condition number of the FETI method with the Dirichlet 
preconditioner satisfies 


XmaxiPvMPyF) 
Xmin{Pv M Pv F) 


<c(i+iog|r 


with 7 = 3, and j = 2 in the special cases listed in Lemma 8. 
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A Systematic Solution Approach for Neutron 
Transport Problems in Diffusive Regimes * 

T. A. Manteuffelt K. J. Ressel^ 


SUMMARY 


A systematic solution approach for the neutron transport equation, based on a 
least-squares finite-element discretization, is presented. This approach includes the 
theory for the existence and uniqueness of the analytical as well as of the discrete 
solution, bounds for the discretization error, and guidance for the development of an 
efficient multigrid solver for the resulting discrete problem. To guarantee the accuracy 
of the discrete solution for diffusive regimes, a scaling transformation is applied to 
the transport operator prior to the discretization. The key result is the proof of the 
U-ellipticity and continuity of the scaled least-squares bilinear form with constants 
that are independent of the total cross section and the absorption cross section. For 
a variety of least-squares finite-element discretizations this leads to error bounds 
that remain valid in diffusive regimes. Moreover, for problems in slab geometry a 
full multigrid solver is presented with U(l, l)-cycle convergence rates approximately 
equal to 0.1, independent of the size of the total cross section and the absorption 
cross section. 


1. INTRODUCTION 


The deterministic numerical solution of neutron transport problems becomes hard 
in diffusive regimes, which are characterized by very large total cross sections and very 

*This work was supported by the DOE under grant DE-FG03-93ER25165 and the NSF under 
grant DMS-8704169. 

^Program in Applied Mathematics, University of Colorado at Boulder, CB 526, Boulder, CO 
80309-0526 (tmanteufSsobolev.Colorado.edu). 

^Interdisciplinary Project Center for Supercomputing (IPS), Clausiusstrafie 59, RZ F-11, ETHZ, 
CH-8092 Zurich, Switzerland (kjrfiips.id.ethz.ch). 
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small absorption cross sections. In these regimes the transport equation is nearly sin¬ 
gular and its solution in the interior of the computational domain is close to the 
solution of a diffusion equation. In order to solve diffusive transport problems nu¬ 
merically, it is advantageous to use a discretization for the transport operator that 
resembles a good approxirnation of a diffusion operator in diffusive regimes. In the 
past, special discretizations for transport problems in slab geometry have been de¬ 
veloped that have this property. Among them are the Diamond Difference scheme 
(Lewis and Miller [16]), the Linear Discontinuous scheme (Alcouffe et al. [2]) and 
the Modified Linear Discontinuous scheme (Larsen and Morel [15]). However, these 
discretizations have the disadvantage that either the solution of the resulting discrete 
system (Manteuffel et al. [17] [18]) or their extension to higher dimensions is difficult. 

In this paper we present a general framework for constructing discretizations of 
transport problems that are accurate in diffusive regimes. This framework, which 
is based on a least-squares variational formulation in combination with a scaling 
transformation, represents a systematic solution approach since it includes the theory 
for the existence and uniqueness of the analytical, as well as of the discrete, solution, 
bounds for the discretization error, and guidance for the development of an efficient 
multigrid solver for the resulting discrete problem. 

To introduce our notation we recall that the single group, steady state, isotropic 
form of the neutron transport equation is given by (Lewis and Miller [16]) 

[0 • Y -h atl — (JsP] V^(r, 0) = q{l, 0) for (r, eTZx 
'0(21,0) = giz, 0) for r e dTZ A n(r) • 0 < 0 

where at is the total cross section, ag is the scattering cross section, and '0(21,0) is the 
angular flux, to be determined for all points r = {x, y, z) in a region TZ C IR^ with a 
sufficiently smooth boundary (for example of class (Grisvard [10, p. 5]) and all 
possible travel directions O on the unit sphere S^). The operator P is defined by 

PHz,Q) — ^ [ 0’(2:,O') dO', (1.2) 

47r J 
si 

which is an L^-projection onto the space of functions that are independent of direction 
angle O- The boundary conditions specify the inflow of particles into the region TZ, 
since n{r) denotes the unit outgoing normal at r G dTZ. Such problems arise as the 
inner loop of time-dependent, multienergy-group problems. 

In the case of slab geometry it is assumed that |^ = |^ = 0, so that ■0(21,0) = 
'ip{z, y) with y := cos{9), where 9 denotes the angle between 0 and the 2 :-axis. Equa- 
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tion (1.1) reduces then to [16] 


d 

+ O’tl 
oz 


a.P 




ij{Zr,ll) 


= q{z, n) for {z, jj) G [zu ^r] x [-1,1] 

= gi(/j.) for/j,>0 
= gr(,M) for /i < 0 


(1.3) 


Now, the operator F is defined by 



-1 


(1.4) 


which is an L^-projection onto the space of all functions that are independent of //. 

Without loss of generality, we assume in the following vacuum boundary conditions 
{g{L,Q) = 0 in (1.1) and gi{fj,) = 5r(A*) = 0 in (1.3), respectively) and further that 
diam(7?.) = 1 in (1.1) and \zr — zi\ = 1 in (1.3), respectively. Both assumptions can 
be established by a simple transformation. 

When at oo and ^ 1, equations (1.1) and (1.3) become singular. Dividing 

(1.1) or (1.3) by at results in the limit equation (/ — P)ip = 0. Therefore, the limit 
solution is independent of direction angle 0 and /i, respectively. Moreover, when 
at —>■ oo and ^ 1 in a certain way, which is called the diffusion limit, it can 

be shown (Larsen [13]) that the limit solution converges to a solution of a diffusion 
equation. To be more specific, we introduce the absorption cross section aa := at — as 
and a small parameter e. The diffusion limit can then be defined as the limit e —>■ 0 
after scaling the cross sections and the source in the following way: 


g(z:,0) ^ £qir,[l), at -> -, aa eoL, 


(1.5) 


where ol is assumed to be 0(1). In this parameterization the transport equation 
becomes 




Q.-V + -{I-P)+eaP 'ip{r,Q) = eq{r,Q). 


( 1 . 6 ) 


Using an asymptotic expansion in e it can be proven (Larsen [13], Pomraning [24]) 
that the solution of (1.6) has the diffusion expansion 


V'(r,S) = I^o(r) +£iAs(r,2), 


(1.7) 


where 0o iS; at a few mean free paths away from the boundary, a solution of the 
diffusion equation 

-V • ^V^o(2:) + = Pq{r,Q,). (1.8) 

For the following analysis of a least-squares finite-element discretization of the trans¬ 
port equation (1.1) we use the form of the transport operator in (1.6). 
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This paper is organized as follows. In Section 2, we describe briefly the least- 
squares flnite-element discretization. Further, we introduce and motivate in this 
section a scaling transformation that is applied to the transport operator prior to 
the discretization in order to ensure the accuracy of the discrete solution for diffusive 
regimes. In Section 3, we state that the scaled least-squares bilinear form is continuous 
and F-elliptic in a certain norm with constants independent of e and a. The existence 
and uniqueness of the analytical, as well as of the discrete, problem then follows 
directly from the Lax-Milgram Lemma [7]. 

Furthermore, the continuity and the y-ellipticity, in combination with Cea’s 
Lemma [7], are the basis for discretization error bounds that are established in Sec¬ 
tion 4 for a variety of conforming flnite-element spaces. Since the continuity and the 
F-ellipticity constants are independent of s and a, these error bounds remain valid 
for diffusive regimes. Thus, the least-squares discretization of the scaled transport 
equation with simple conforming finite-elements yields an accurate discrete solution, 
even in diffusive regimes. In Section 5, we describe a full multigrid solver for problems 
in slab geometry and present some convergence rates. Finally, in Section 6 we draw 
some conclusions. 


2. SCALING TRANSFORMATION 


Let us denote the standard inner product and associated norm of x by 

{u,v) := j J u ■ V* dCldr] \\u\\ ^ u,v E L‘^{TZ x S^), (2.1) 

n 51 


where v* is the complex conjugate^ of v. Further, let R be a Hilbert space with 
underlying norm H’lly, which we will specify later. Then, the least-squares variational 
formulation of (1.1) is given by (see (1.6)) 


min F (ip), 

ipev 


with 


-P'(^) ■= j J - qir,\ 


dVtdr. 


n SI 


( 2 . 2 ) 


In order for ■0 G R to be a minimizer of the functional F in (2.2), a necessary condition 
is that the first variation of F must vanish at 0 for all admissible u G R, which results 
in the following problem; find ip eV such that 

d{ip,v) := {Cip,Cv^ = {q,Cv^ Vu G R. (2.3) 

For the least-squares finite-element discretization of (2.2), the Hilbert space R is 
replaced by a finite dimensional subspace R^ C R. This leads to the discrete problem: 

^We allow here complex valued functions, since we use in Section 4 the expansion of v into 
spherical harmonics. 



find such that 


= {q,^Vh) yvheV^. (2.4) 

By an asymptotic analysis it was shown in [19] and [25] for slab geometry and 
formed by piecewise linear basis functions in space and a finite number of Legendre 
polynomials as basis functions in angle that this direct least-squares approach is not 
accurate in diffusive regimes. This can also be explained by the following heuristic 
argument. Because of the diffusion expansion (1.7) the important component of the 
solution 'll) in diffusive regimes is the part that is independent of direction angle 0, 
which is given by Pip. On the other hand, the component {I — P)ip of the solution is 
irrelevant in diffusive regimes. By Cea’s Lemma [7], the solution of the least-squares 
discretization can be viewed as the best approximation to the exact solu tion in the 
discrete space with respect to the semi-norm ^^5(^70 •= \l < However, 

the different terms in the operator £, as defined in (1.6), are unbalanced (there are 
O(^), 0(1) and 0{e) terms), so that different components of the approximation error 
are weighted differently in •). The leading term of £ is \{I - P), which means 
that the part of the error that is dependent on angle is weighted in this norm very 
strongly in diffusive regimes (very small e), even though this part is irrelevant. On the 
other hand, the part of the error that is independent of angle, whic h is the important 
part in diffusive regimes, is hardly measured in the semi-norm y^5(-, •), since it is 
weighted by s. 

The idea is to scale equation (1.6), thus changing the weighting in the norm used 
in the least-squares discretization, which, in turn, alters the choice of the element 
of the discrtete space as an approximation to the exact solution. Let us define the 
following scaling transfor'mation and its inverse: 

S:=P + e{I-P), S-^ = P+^{I-P). (2.5) 

Clearly, applying the scaling transformation S from the left to the transport equation 
prior to the least-squares discretization will increase the weight of the important error 
component and decrease the weight for the irrelevant component. After applying the 
scaling transformation S from the left and dividing by e, equation (1.6) becomes 

Cip := -slip = -50 • W + - (^ - H)V’ + o^P'^ = Qs, (2.6) 

£ £ £ 

with Qs := Sq. 

Equation (2.6) can be balanced further by applying the scaling transformation 5 
also from the right. Let the domain of operator £ in (2.6) be the Hilbert space V. 
Then, we define a space V by 

y := S-% (2.7) 
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so that 

V = and Sv = v (2.8) 

for all u G F and v G V. Scaling (2.6) from the right results in 

£55-V = CSip = Q-Y'$+{I - P)^ + aP^ = Qs, (2.9) 

where 

Q := -505 = (1 - e) (PO + OP) + eOJ. 

— e 

In the double-scaled operator £5 in (2.9) the derivative of zeroth moment (PV?/)), 
the derivative of the first moments (PO • VV^) and all components of ip themselves 
are weighted equally. Moreover, it is easily seen that the double-scaled operator £5 
goes to a bounded nonsingular limit operator as £ —)• 0. 

In the least-squares context, the additional scaling from the right can be avoided 
because 


min (£5-0 — ^5, £5'0 — gs) min (£0 — g*, £-0 — g*), (2.10) 

which will simplify the boundary conditions and so the computations. However, for 
the theory we exploit the nice form of the double-scaled operator £5 and use this 
form of the transport operator as a tool. 

The least-squares variational formulation of the single-scaled equation (2.6) is 
given by the problem: find ip E V such that 

a{ip,v) := {Cip,Cv) = {qs,Cv) Vu G H. (2-11) 

For the sake of completeness we remark that for slab geometry the form of the 
scaling transformation 5, as defined in (2.5), remains the same, except that for P the 
definition (1.4) has to be used. In the case of slab geometry, therefore, equation (2.6) 
reduces to 

Cip := -5£^ = -5//-^ -I- - (/ — P) 0 + aPip = qs- (2.12) 

£ £ OZ £ 


3. CONTINUITY AND V-ELLIPTICITY 


In this section we summarize without proof that the scaled least-squares bilinear 
form (2.11) is continuous , i.e., there exists a constant Cc > 0 such that for every 
u,v eV 

la(u,u)l< Cc |lu|lv Ijuliv, (3.1) 
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and V-elliptic , i.e., there exists a constant Ce> 0 such that for all u G F: 


a{v,v) > Ce ||u||y 


(3.2) 


The Hilbert space V and its norm IHly are specified below. It is crucial to prove these 
bounds with constants Cg 3 .nd Cg that are independent of e and a, since this makes it 
possible to establish discretization error bounds that remain valid in diffusive regimes. 

We first consider the slab geometry case. Let D := [zi,Zr\ x [—1,1] denote the 
computational domain and let (•,•) and H-H denote the standard inner product and 
the associated norm of L‘^{D), which are defined by 

S/f' 1 

{u,v) := j J u - V dfjLdx and ||u|| := yj {u, u). 

Xi -1 


An appropriate norm for bounding the least-squares bilinear form a(-, •) is then given 
by the norm 


Mv ■= 


1 dv 

2 

1 / 

£ OZ 

+ 

-(/ — P)v 
£ 


+ \\Pv\f 


(3.3) 


The Hilbert space V can then be defined by 


V := E C°°{D); v{zi, ji) = 0 iov p > 0; v{zr, //) = 0 for // < o|, (3.4) 

where the closure is taken with respect to the norm || • ||y. 

Prom the Cauchy-Schwarz inequality and discrete Holder inequality it is easy to 
obtain that for all u,v eV 


\a{u, u)| = \{Cu, Cv)\ < l|£u|| l|£u|| < 3 l|u||y Hujly. (3.5) 

Thus, the bilinear form (2.11) is continuous with respect to the norm H-Hy with 
C'e = 3. 

The proof of the P-ellipticity is much harder and requires several technical lemmas. 
For a proof of the following theorem we refer the reader to [20] and [25]. 


Theorem 3.1 (P-ellipticity of a(-, •) ) Suppose that 0<Q!<1, 0<£< Let 
a(-,-) and H-Hy be given as in (2.11) and (3.3) respectively. Then, there exists a 
constant Ce > 0 such that for all v eV, 

a{v,v) > Ce \\vfy, . (3.6) 

where Cg = 0.012, which is independent of e and a. □ 
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In the case of x-y-z-geometry we let D := 7^ x and generalize the definition of 
•\\y in (3.3) and the Hilbert space V in the following way: 



-50 • Vv 
e 


+ 


-(I-P)v 

e 


+iip»ir 


(3.7) 


1/ := G C°°{D)\ v{r,Q) = 0 for r G dP, and O • 2l(r) < o|, (3.8) 

where the closure is now taken with respect to the norm |l-||y in (3.7) and |j-|l denotes 
the norm in (2.1). The continuity (3.5) and the H-ellipticity (3.6) hold then with 
exactly the same constants Cc and Cg as in the slab geometry case. 

Together with the Lax-Milgram Lemma [7] the existence and the uniqueness of a 
solution for problem (2.11) and its discrete version (4.1), where V is replaced by a 
finite dimensional subspace C V, follows directly. In the next section we will use 
the continuity and the H-ellipticity of the bilinear form a(-, •) to prove discretization 
error bounds for a variety of discrete spaces V^. 


4. DISCRETIZATION ERROR BOUNDS 


In this section we establish bounds for the discretization error xjj — tph- Here, 

Ip E V denotes the solution of (2.11) and iph E C V denotes the solution of the 
corresponding discrete problem: find -iph G such that 

a{iph,Vh) = {qsXvh) yvhEV^. (4.1) 

The continuity and the U-ellipticity of a(-, •) lead directly to Cea’s Lemma [7]: 

a('0 ’iph) < a{'ip -Vh,'ip- Vh) 'ivh G (4.2) 

or 

11^ - '0/illy < min W-ip - VhWv • (4-3) 

Therefore, bounding 11-0 — -0^11 ^ is reduced to the problem of bounding min \\ip — Vh\\v, 

Vhev^ 

which is a problem of approximation theory and depends on the space V^. Here we 
consider discrete spaces that are formed by functions that can be expanded into 
the first N Legendre polynomials (spherical harmonics in the case of x-y-z-geometry) 
with respect to the direction angle ix (0) and are piecewise polynomials of degree k in 
z (r) on a partition Th of the slab [zi^ Zr] (region P). This class of finite dimensional 
subspaces corresponds to a discretization by a spectral method in angle and a finite- 
element discretization in space. The spectral discretization in angle with Legendre 
polynomials (spherical harmonics) is common for transport problems [16] and also 
called a P/v-discretization. 
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Again, we consider the slab geometry case first. Let Th = {zi =: zq, zi,..., z^. := Zr} 
be a partition of the slab [zi,Zr] with maximum mesh size h and let lPk{Th) denote 
the space of piecewise polynomials of degree < A: on the partition Th- Further, let 
denote the Z-th Legendre polynomial. The normalized Legendre polynomials 
Pi{fj,) := \/2Z + 1 PiifJ-) form then an orthonormal basis of L^([—1,1]). Thus, any 
i) eV has the following,expansion in angle, 

OO 

(4.4) 

2 = 0 , 

where the Fourier coefficients 4>i{z), which are called moments in transport theory, 
are given by 

1 } 

= 2 y (4-5) 


For the discretization we truncate the expansion in (4.4) and approximate the 
moments 4)i{z) by piecewise polynomials on the partition Th- This results in the 
discrete space 


N-l 


:= \vhe C\D)- Vh=Yl e IPriTh) for Z = 0,..., AT - 1; 


2=0 


Vh{zh iP) = 0 for // > 0, Vh{zr, At) = 0 for /X < 0 > . 


(4.6) 


Let I • |i/,o denote the standard semi norm of H’^([zi, Zr]) x L^([—1,1]). Combining 
Cea’s Lemma, standard finite-element approximation bounds and using the fact that 
the Legendre Polynomials are eigenfunctions of the Sturm-Liouville operator [9, p.21], 
that is. 


PsPiitJ^) ■= 


A. 

dpi 



Z(Z + l)p/(At), 


the following discretization error bound can be established (see [20] and [25]). 


Theorem 4.1 (Discretization Error bound for slab geometry) Suppose 0 < a < 1 
and 0 < e < Let ip E V n (jI^'^^{[zi,Zr]) x i?^([—1,1])^ be the solution of 
(2.11) with Qs E H^{[zi,Zr]) x iT^([—1,1]). Further, let iph E be the solution 
of (4.1) with defined as in (4.6). Assume that ip has the diffusion expansion 
ip{z, pi) = (Pq{z) + e(pR{z, At). Then, 


11 ^ - P^hWv < 


1 /Cl 

vc:[n 




Ls 


dip 


(4.7) 


+ 



{C'sZt* {\(po\k+ifi + \4>R\k+ifi) + ef} eh, 
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with Ci,C2,Cz independent of a and e. In particular 


P p 


dj-ip - 

dz 


< eeh, 


{I-P)[h 


dj-ip - iphY 
dz ■ 


< en, 


\{I - P){'tp - ifh)]] < eeh, 

WPif’- A)\\ < eh. □ 


For the definition of the boundary error ef we refer to [20]. However the following 
remark explains the source of this error. 


Remark 4.2 (Treatment of Boundary Conditions) In order to have C V, which 
is necessary for Cea’s Lemma, we incorporated the boundary conditions in the defi¬ 
nition (4.6) of the discrete space V^. However, in conjunction with a P^ discretiza¬ 
tion in angle, these boundary conditions can only be fulfilled by a discrete function if 
4>i{zi) = f>i{zr) = 0 for / = 0,1,..., —1. Therefore, the boundary conditions for the 

discrete problem are really given by Vh{zi,p) = Vh{zr, /r) = 0 for /i G [-1,1]. The dif¬ 
ference to the real boundary conditions {v{zi, p) = 0 ior p e [0,1] and v{zr, p) — 0 for 
p e [—1, 0]) is measured in the error bound (4.7) by the term ef. In diffusive regimes, 
where the analytical solution is nearly independent of p, we have that v{zi,p) ^ 0 
for yu G [—1, 0] and v{zr, p) ~ 0 for p G [0,1], so that ef will be small. However, for 
nondiffusive problems, it is not, in general, true that the inflow of particles is nearly 
equal to the outflow. In this case, ef will, in general, be large. 

One way to avoid this difficulty would be to use nonconforming finite element 
subspaces, that is, to require that functions in the discrete subspace obey Mark or 
Marshak boundary conditions [8]. Since then (f V, Strang’s Lemma [6] instead of 
Cea’s Lemma must be used in order to establish error bounds. 

Another, more natural, way to address this issue would be to incorporate the 
boundary conditions directly into the least-squares functional. For example, one 
could add to the bilinear form a(-, •) in (2.11) the boundary form 

Zi, p)v{zi, p)dp - ^ pifiZr, p)v{Zr, p) df^j 

and use a discrete space with functions that are free of any boundary constraint. 
Error bounds based on this approach will appear in a forthcoming paper. □ 

Remark 4.3 (Nondiffusive regimes) In order to get an error bound in (4.7) with 
a constant that is independent of parameter e it is assumed in Theorem 4.1 that 
the analytical solution has a diffusion expansion. For regimes, where the diffusion 
expansion is not valid, ^ is of moderate size, so that there is no need for an error 
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bound that is independent of e. In this case the second term on the right hand side 
of (4.7) simplifies to 

^Czhl^ (^1 + - j l^lfe+1,0 + ef I • 

However, we point out that this bound will blow up in diffusive regimes, where ^ 
becomes very large. □ 



Now, we generalize the error bounds for slab geometry to x-y-z geometry. Let Th 
be a triangulation of IZ into thetrahedrons of maximum diameter h. Recall that the 
spherical harmonics [3, p. 571] are defined by 

Yrid,^) := i-irCi,mPr{cos{6))e^^^, 


for Z > 0 and where 


Cl, 


m 




{2l + l){l-m)\ 


denotes the associated Legendre polynomials, and 9 denotes the polar angle with 
respect to the 2 :-axis, while ip denotes the azimuthal angle about the 2 :-axis. The 
spherical harmonics form an orthonormal basis of Therefore, any v E L^{7Zx 

S^) has an expansion of the form 


OO l I. 

v{t,a) = 'ZT, 4;At)Yr{a), mth 4,Ur) = J v{r,a)Yr(a)da. (4.8) 

1=0 m=—l gl 

Similar to the slab geometry case, we truncate this expansion for the discretization 
and approximate the moments (j)i^rn by a function G IPk{Th), where IPkiXh) 
denotes the space of piecewise polynomials of degree < A: on the triangulation Th- 
Thus, we define the following class of discrete spaces: 


V'‘:={v,eV: t-»(r,a)=S £ <l>tm{z)Yr{a);4’Ud ^ , (4.9) 

1=0 m=—l 


which correspond to a finite-element discretization in space and a discretization 
[16] in angle. 

Let I • |fc+i,o denote the semi norm of x L'^{S^). As in the slab geometry 

case, we combine Cea’s Lemma, standard finite-element approximation bounds and 
use the fact that the spherical harmonics are the eigenfunctions of the Laplacian 
operator An on the unit sphere to obtain the following discretization error bound 
(see [20] and [25]). 
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Theorem 4.4 (Discretization Error Bound for x-y-z geometry) Suppose 0 < o < 1 
and 0 < e < ^. Let ■0 G E fi x be the solution of (2.11) with 

Qs G H’^iflZ) X Further, let 'tph G be the solution of (4.1) with defined 

as in (4.9). Assume that 0 has the diffusion expansion (1.7). Then, we have: 


0 - f’hWv < 


(ll^ng.li + |An0|i^o) 

(l0o|fc+i,o + I0i?|fc+i,o) +ef} , 
V 


(4.10) 


with Cl and C 2 independent of e and a. □ 


5. MULTIGRID SOLVER 


The accuracy of the least-squares discretization in combination with the scaling 
transformation for diffusive transport problems has been demonstrated numerically 
in [19], [25] and in [20]. In this section we restrict the presentation of numerical 
results to a full multigrid solver for problems in slab geometry. We refer the reader, 
who is not familiar with multigrid methods to (Briggs [5]) for an introduction and to 
(Hackbusch [11]) and (McCormick [21] [22] [23]) for more advanced topics. 

The proper choice of the components, namely, the inter-grid transfer operators, 
coarse grid problems, and relaxation schemes, is essential for the efficiency of a multi¬ 
grid solver. The choice of the first two components is naturally given by the least- 
squares variational formulation. The sequence of discrete spaces Ui C V 2 C • • • C 
Vi = V^ determines the coarse grid problems since they are just the restriction of the 
variational problem to these discrete subspaces. The prolongation operator, which is 
a mapping from a coarse grid to the next finer grid in the grid sequence, is formed 
directly by composing the isomorphisms between the discrete spaces and their corre¬ 
sponding coordinate spaces with the injection mapping between 14_i and I 4 (Bram¬ 
ble [4]), (McCormick [23]). The restriction operators, which are mappings from a 
finer grid to the next coarser grid, are just the adjoints of the prolongation opera¬ 
tors. Therefore, the only multigrid components that need to be chosen here are the 
sequence of discrete spaces and the relaxation. 

For the discrete subspaces, we use finite-element spaces with linear basis elements 
on increasingly finer partitions (halving the spatial cells) of the slab. 

As relaxation we employ a line moment relaxation that updates all moments 
simultaneously for a given spatial point. Our computational tests showed essentially 
no differences in the error reduction and smoothing properties of this line relaxation 
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Table 5.1: Multigrid convergence factors. 


U(l, l)-cycle 

crt 

a = 1.0 

q: = 0.5 

a = 0.25 

a = 0.1 

a = 0.0 

10 ® 

0.052 

0.086 

0.083 

0.118 

0.169 

10 ^ 

0.091 

0.092 

0.091 

0.117 

0.136 

102 

0.056 

0.056 

0.071 

0.106 

0.131 

10 ® 

0.092 

0.093 

0.092 

0.105 

0.127 

10 ^ 

0.095 

0.094 

0.094 

0.106 

0.129 

10 ® 

0.095 

0.094 

0.093 

0.107 

0.130 

10 ® 

0.095 

0.092 

0.092 

0.107 

0.130 

10 ^ 

0.095 

0.092 

0.092 

0.107 

0.130 

10 ® 

0.095 

0.092 

0.092 

0.107 

0.130 

10 ® 

0.095 

0.094 

0.092 

0.107 

0.130 

10 ^® 

0.095 

0.094 

0.092 

0.106 

0.130 


scheme for various different orderings of the spatial points. To save computation, we 
use this line relaxation scheme in a red-black fashion, since then the residual after 
one relaxation sweep is zero at the black points and need not be computed for the 
restriction to the next coarser grid. This scheme is also more amenable to advanced 
computer architectures. 

The convergence rates for a y(l, l)-cycle of this multigrid algorithm, which uses 
one relaxation before and one relaxation after the coarse grid correction, are listed in 
Table 5.1. Even for values of at = 1/e > 10®, we get 1^(1, l)-cycle convergence factors 
of order 0.1. These convergence factors are sufficient to get a solution with an error 
on the order of the discretization error by one single full-multigrid cycle. 


6. CONCLUSION 


The least-squares finite-element discretization with piecewise linear basis func¬ 
tions in space directly applied to the neutron transport equation does not yield a 
correct discrete solution in diffusive regimes. However, in combination with a scaling 
transformation applied to the transport operator prior to the discretization, the least- 
squares discretization is accurate for diffusive regimes and represents a systematic, 
general, solution approach. 

This approach, which converts the first order transport problem into a variational 
form with a symmetric bilinear form, is systematic because it includes the theory for 
the existence and uniqueness of the analytical as well as for the discrete solution. 
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bounds for the discretization error and guidance for the development of an efficient 
multigrid solver for the resulting discrete system. 

The key results are the y-ellipticity and the continuity of the scaled least-squares 
bilinear form with constants independent of e and a. They make it possible to estab¬ 
lish error bounds that rernain valid in diffusive regimes. Together with the freedom 
to choose a discrete space, this approach yields a general framework for finding dis¬ 
cretizations for the transport equation that are accurate in diffusive regimes. 

Because of its generality, this approach opens many possibilities for future work. 
The use of different discrete spaces can be explored. For example, one may consider 
finite-elements as basis functions for discretization of the angle dependence instead 
of Legendre polynomials or Spherical Harmonics. The boundary conditions could 
be incorporated directly into the least-squares functional, which would be a more 
appropriate treatment of the boundary conditions. Adaptive refinement could be 
combined with the multigrid solver in order to resolve boundary layers. Finally, it 
appears that it is possible to generalize the scaling transformation to anisotropic 
transport problems. 
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First-Order System Least-Squares for Second-Order Elliptic 
Problems with Discontinuous Coefficients 


Thomas A. Manteuffel Stephen F. McCormick Gerhard Starke* 


Abstract 

The first-order system least-squares methodology represents an alternative to stan¬ 
dard mixed finite element methods. Among its advantages is the fact that the finite 
element spaces approximating the pressure and flux variables are not restricted by the 
inf-sup condition and that the least-squares functional itself serves as an appropiate er¬ 
ror measure. This paper studies the first-order system least-squares approach for scalar 
second-order elliptic boundary value problems with discontinuous coefficients. Elliptic- 
ity of an appropriately scaled least-squares bilinear form is shown independently of the 
size of the jumps in the coefficients leading to adequate finite element approximation 
results. The occurrence of singularities at interface corners and cross-points is discussed, 
and a weighted least-squares functional is introduced to handle such cases. Numerical 
experiments are presented for two test problems to illustrate the performance of this 
approach. 


Introduction 

The purpose of this paper is to apply the first-order system least-squares approach 
developed in [4] and [5] to scalar second-order elliptic boundary value problems in two 
dimensions with discontinuous coefficients. Such problems arise in various application 
areas, including flow in heterogeneous porous media (see, e.g., [12]), neutron transport 
[1], and biophysics [7]. In many physical applications, one is interested not only in an 
accurate approximation of the physical quantity that satisfies the scalar equation, but 
also in certain of its derivatives. For example, fluid flow in a porous medium can be 
modelled by the equation 

-V • (aVp) = / (1) 

for the pressure p, where the scalar function a may have large jump discontinuities across 
interfaces. Of particular interest here is accurate approximation of the fluid velocity 

u = aVp , (2) 

a concern which led to the development of mixed finite element methods (see, e.g., [3, 
Chapter 10]). In mixed methods, both p and u are approximated by not necessarily 
identical finite elements and, roughly speaking, a Galerkin condition is imposed on the 
first-order system resulting from (1) and (2). 

An alternative to mixed finite elements is the first-order system least-squares ap¬ 
proach developed and analyzed, e.g., in [4], [5], [11], and [10]. This methodology re¬ 
places the Galerkin condition by the minimization of a least-squares functional associ¬ 
ated with a first-order system derived from (1) and (2). Augmenting the basic system 
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with the curl-condition V x (u/a) = 0 (see [5], [10]) leads to ellipticity with respect 
to the H^{Q) norm in the individual variables. Important practical advantages of this 
least-squares approach over standard mixed methods are; (i) the finite element spaces 
approximating the pressure and flux variables are not restricted by the inf-sup condition 
of Ladyzhenskaya-Babuska-Brezzi (cf. [3, Section 10.5]) and (ii) the least-squares func¬ 
tional serves as an appropriate error measure. Moreover, if the problem is sufficiently 
regular (e.g., if a G C'^’^(n) and 11 has certain properties (cf. [5])), then (iii) optimal 
accuracy is guaranteed in each variable, including the velocities, in the norm and 
(iv) optimal computational complexity for the solution of the resulting discrete systems 
is achieved with standard multigrid methods (see [5]). 

For problems with discontinuous coefficients, which is our focus in this paper, the 
velocity components will, in general, not be in While the theory developed in [4] 

and [5] already allows for discontinuous coefficients, special care must be taken in order 
to prove ellipticity, in an appropriate norm, with constants independent of the size of 
the jumps. For this purpose, an appropriate scaling of the least-squares functional that 
depends on the size of a in different parts of the domain is introduced. This results 
in ellipticity, independently of the size of coefficient jumps, and consequently in finite 
element approximation results, with respect to a norm that is suitably scaled depending 
on the size of a. This scaling is presented in the following section. 

At interface corners and cross-points (i.e., where two smooth interface components 
intersect), the components of u will, in general, be unbounded, and singularities natu¬ 
rally arise (see, for example, Strang and Fix [14, Ch. 8]). The shape of these singularities 
is determined by the angle at an interface corner (or between two intersecting interfaces) 
and the jumps in the coefficients. We will show how the parameters describing these 
singularities can be computed from the coefficient jumps and corner angles. We are par¬ 
ticularly interested in the exponent associated with the singular function at a corner or 
cross-points since this indicates how much we have to unweight the least-squares func¬ 
tional in the neighborhood of such a point. The performance of this scaled least-squares 
approach will be studied using bilinear finite elements for the pressure and fluxes (based 
on the same grid) and a full multigrid algorithm for the solution of the resulting discrete 
system. Finally, computational experiments for two test problems are presented. 

Our restriction to two-dimensional problems is mainly for the purpose of exposition. 
However, some technical complications arise for three-dimensional problems. For ex¬ 
ample, two different types of singularities, associated with edges and with corners or 
cross-points, arise in three dimensions. We do not examine this in the present paper. 


The Least-Squares Functional 


Consider the following prototype problem on 

a bounded domain Q C 


-V-(aVp) = /, 

in !□ , 


p = 0, 

on Td , 

(3) 

<1 

II 

0 

on Fat , 



where n denotes the outward unit vector normal to the boundary, / G and 

a(a;i, X 2 ) is a scalar function that is uniformly positive and bounded in f2 but may have 
large jumps across interfaces. We assume that Fr? 0, so that the Poincare-Friedrichs 
inequality 

Ibllo.n < 7l|Vp||o,n (4) 

holds and (3) has a unique solution in H^{Q). Following [5], we rewrite (3) as a first- 
order system by introducing the flux variable u = aVp:, 


u — aVp = 0 , 

-V-u = /, 
p = 0, 

n ■ u = 0 , 


in , 
in , 
on Fd , 
on Fjv . 


(5) 
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Since u/a = Vp with p € then we have (cf. [6, Theorem 2.9]) 

V X (u/a) = 5i(w2/a) — = 0 , in . 

Moreover, the homogeneous Dirichlet boundary condition on Fx) implies the tangential 
flux condition 

n X (u/a) = (niU2 — n 2 Ui)/a = 0 , on Fd . 

Adding these equations to first-order system (5) yields the augmented system 


u — oVp ^ 

0, 

in n , 

-V -u = 

/, 

in fl , 

V X (u/a) = 

0, 

in , 

P = 

0, 

on Td , 

n • u = 

0, 

on Fiv , 

n X (u/a) = 

0, 

on Fx? . 


In addition to L^(II) and H^{^) with the respective norms || • ||o^n and || ■ we will 
need the spaces 

F(div; fi) = {v G : V • V e L^{Q.)} , 

i?(curla;n) = {v G : V x (v/a) G L^Q)} 

and 

V = {qeH\n):q = OonTn}, .. 

W = {v G i?(div; f2) n Ff(curla; fi) : n • V = 0 on Fjv , n X (v/a) = 0 on Fx)} . ' ^ 

Clearly, for the solution of (3), we have p EV and u G W, so it is appropriate to pose 
(6) on these spaces. 

As mentioned above, our main interest is in the solution of (3) when a(a;i,a; 2 ) bas 
large jumps. Following Bramble, Pasciak, Wang, and Xu [2], we assume that 

J 

n = [J fij 

i = l 

with {flj} being mutually disjoint open polygonal regions; that the restriction of 
a(xi,X 2 ) to is in C^(flj); .and that 

ciWi < a(a:i, X 2 ) < C 20 Ji for (xi, X 2 ) E flj 

with constants ci,C 2 of order one and arbitrary positive constants Wj. In other words, 
a(xi, X 2 ) is assumed to be of approximate size Wj throughout for each i while large 
variations in {wj} over i are allowed. The bounds derived below will be independent of 
this variation in {w,}, but the constants in these bounds will depend on the variation 
within each flj, that is, on Ci and C 2 . 

An appropriate scaling of the equations in (6) leads to the least-squares functional 
G{-a,p;f) = i|u/Va - VqVp||g^n-f yV -u-f /|lgf^-|- ||a V x (u/a)||gfj (8) 
and associated bilinear form 

^(u, p; V, q) = (u/y/a - VaVp, y/^ - VaV^jo.n /qs 

-b(V -u, V-v)o,n + (aV X (u/a),aVx (v/a))o^n • ^ 

Here, for the sake of notational simplicity, we agree that (•, ■)o,n. is meant componentwise 
for vector functions. That is, if w = {•wi,W 2 ) and z = (zi, -^ 2)1 then 

(w,z)o,n = (wi, ^i)o,n + ( 1 ^ 2 , ^ 2 ) 0,0 • 
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The solution of (5) will also solve the minimization problem 


G(u,p;/)= min G(v,g;/) (10) 

(v,?)eWxK 

and, therefore, the variational problem 

:r(u,p;v,g)= -(/, V-v)o,n for all (v, g) G W x K . (11) 

Here we show that T{v,q;v,q) is uniformly equivalent to the scaled norm defined for 
(v, g) 6 W X 1^ by 

lll(v, 9)111 = (||V • + ||aV X (v/a)||^ n + l|v/\/a|lo,n + ■ 

Theorem 1 Under the above assumptions, there exist constants ~/i and j 2 , independent 
of 'the size of the jumps in {un}, such that 

^(u,p;u,p) > 7 i|||(u,p)||p for all {u,p) eW X V (12) 

and 

•?'(u,P;v,g) < 72 |||(u,p)||||||(v,g)||| for all {u,p) , {v,q) eW xV . (13) 


Proof. The proof is similar to the proof of [4, Theorem 3.1] (see also [10, Theorems 
2.1 and 2.2]). We include it here because we must confirm that the constants 71 and 72 
are independent of the jumps in a. The main part of the proof consists in showing that 
the functionals 

i’(u,p;v,g) = (u/Va - VaVp, v/Va - A/aVg)o,n + (V -u, V • v)o,n 

and 


S{u,p;v,q) = (u/a/o, v/^/a)o,n + (VaVp, \/aVg)o,n + (V • u, V • v)o,n , 


satisfy 

ciSiu,p;u,p) < :F{u,p-,u,p) (14) 

and 

^(u,p;v,g) < C 2 (.S(u,p;u,p))^/^(.S(v,g;v,g))i /2 (15) 

with constants ci and C 2 that are independent of the jumps in a. 

For the proof of (14), we rewrite Poincare-Friedrichs inequality (4) as 

Ibllo.n < TllV«Vp|p^n ■ (16) 

Note that 7 , and consequently the quantity 71 in (12), depends on minxen a(x) > 0. It 
does not introduce, however, any dependence of (12) and (13) on the size of the jumps 
in a. Since on dCl we either have p = 0orn-u = 0, then integration by parts confirms 
that 

(u, Vp)o,n + (V • u,p)o,n = 0 . 

For any r > 0, which we specify later, we have 

:F{u,p;u,p) 

= (u/a/u, u/A/a)o,n + (a/oVp, A/aVf’)o,n - 2(u, Vp)o,n -b (V • u, V • u)o,n 
-b2r(V • u,p)o,n+ 2r(u, Vp)o,n + r^{p,p)o,a - T^iP,p)o.n 
= (u/Va-b (r - l)v^Vp,u/vS-b {r - l)A/aVp)o,r! 

-b(V • u -b rp, V ■ u -b rpjo.n - T^{p,p)o,n + (2r - T^){y/aVp, VaVp)o,n 

> (2r - r2)(v^Vp, V^Vp)o,n - r'^ip, p)o,n 

> (2r - (1 + 7)'r2)||y5Vp||g n • 



Choosing r = 1/(1 + 7 ) leads to 

/(u,p;u,p) > r||^^Vp||g n ■ 

We then also have 

ll“/\/a|lo,n < 2 (||u/Va - vAVp||g + ||\/^Vp||g n) < 2(1 + l/r):r(u,p; u.j?) 
and, clearly, 

l|v-u||gn<^(u>p;u,p), 

which completes the proof of (14). 

Upper bound (15) follows from 

^(u,P;v,?) < 2(i'(u,p;u,p))^/^(.F(v,g;v,g))^/2 

and 

:F{u,p-,u,p) = ||u/V5-x/aVp||2_n +||V-ullgn , . 

< 2 (|ju/Va|| 2 n + ||v^Vp||gn+l|V-u||gf^) = 5(u,p;u,p). 

The proof of Theorem 1 is completed by adding the term ||aV x (u/a)||o,n to both 
sides of the inequalities (14) and (17). | 

Theorem 1 states that ellipticity and continuity of the least-squares bilinear form 
.F( ■,•;•,■) in terms of the norm UK-, •)||| is independent of the jumps in a. Note, however, 
that the ellipticity constant 71 in ( 12 ) depends on the size of a, in particular, on the 
positive constant minxgn fl(x) through the Poincare-Friedrichs inequality (16). 

The scaling of the norm UK-, •)||| has the following physical interpretation. In areas 
where a is relatively small, Vp is allowed to be relatively large, and one has to expect a 
less accurate approximation there compared to areas where a is large and Vp is therefore 
small. In contrast, the velocity u = aVp can be expected to be more accurate in areas 
where a is small and less accurate, in general, where a is large. Ellipticity with constants 
that are independent of the jumps in a asserts that the scaling in !F{-, ■; •, •) correctly 
reflects these attributes. 

Singularities at Interface Corners and Cross-Points 

This section is concerned with the behavior ofp and u at or near the interface curve. 
Most of what we present in this section is well-known; we refer to Strang and Fix [14, 
Chapter 8 ] for further details. 

Recall from the previous section that the solution of ( 6 ) satisfies u G iJ(div;fi) n 
i7(curl a;n). This implies that, at a point on a smooth segment of the interface 
curve, the normal component n • u and the tangential component n x (u/a) must be 
continuous. Assume that U f2 with constant diffusion coefficients a"*" and a ~, 

respectively, and let ti+ = (uj'',uj) and u“ = (uj',uj) denote the solution restricted 
to the respective subdomains (see Figure 1). Then ui and U 2 must satisfy the jump 
conditions 

U~^ n'^ XL XL 

niuf + n 2 ut = niu 7 -|- n 2«2 722—^ — ni-^ = 712— - ni— . ( 18 ) 

a+ a+ a a 

For example, consider the situation shown in Figure 1 (which we will encounter again as 
Example 2 in the final section of this paper). Across the vertical part of the interface, 
ui = n • u will be continuous while M2 = n x u has a jump factor of a+/a~. Similarly, 
across the horizontal part of the interface, ui = —n x u has a jump factor of a+/a“ 
while U 2 = n-u is continuous. At the interface corner, both of these conditions must be 
satisfied, i.e., ui and U 2 must jump by a factor /a~ and be continuous at the same 
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Figure 1: Interface with corner 


time. Obviously, there are only two ways for this to happen: either u = 0 or u = oo at 
the interface corner. In general, the latter case is encountered at interface corners—the 
behavior of u is singular there. 

Without loss of generality, assume that the singularity occurs at the origin, and 
consider the polar coordinate representation 


r cos 6 


The solution of (3) then admits the representation 

7’“(A+cosa6> + A+sina0)+p+(r,6l) , in fi+ 


r“(A,, cos aO + A^ sin ad) + p (r, 9) , in , 


where p+ £ H'^{Q+),p £ (cf. [14, Section 8.1]), a S (1/2,1), and Af,Af are 

constants. Using 

j-lsin^A + cos^iA j (19) 


leads to 


ui(r, 9) 


r“ ^(A+cos(a - 1)0 + A+sin(a - 1)61) 4-Mi'(r, 0) , in n+ 
r““i(A7 cos(a - 1)0 + X- sin(a - 1 ) 0 ) + 0 ) , in Q“ 


U2{r, 0 ) 


^(-A+sin(a - 1)0 + A+cos(a - 1)0) + wj(r, 0) , in 0+ 
^“-^(-A^ sin(a - 1)0 + Aj cos(a - 1)0) + U 2 {r,d) , in 


with £ iy^(f2+) and The parameters a,A+,A+,A~, and Xj 

are computed such that conditions (18) are fulfilled. Setting p = a+/a~ leads to the 
matrix equation 


-/isma^TT //cosQ;|7r 


// sin aTT 


cos OCTT 
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For this homogeneous system of linear equations to have a nontrivial solution, its de¬ 
terminant must vanish, which leads to 

+ —)(cos Tta — cos 27ra) + 2 — cos na — cos 27ra = 0 . (22) 

The exponent a that determines the degree of the singularity apparently depends on the 
size of the jump /z. It can be shown that (22) always has a unique solution a G (1/2,1). 
For ^ 1, i.e., as the jump disappears, we have a —> 1, i.e., the angularity disappears 

as well. For /i—>-0orp^oo,a tends to 2/3, which is exactly the value obtained for a 
reentrant corner with exterior angle 7r/2.' It is straightforward to extend the procedure 
outlined above to any number of adjoining subdomains and any size of angles (cf. [8]). 
We therefore have a computational technique to compute the shape of the singularity 
at interface corners and cross-points where two interfaces intersect. This technique will 
be fundamental for the finite element approach described in the next section. 


Finite Element Approximation 


The minimum of G(u,p;/) is approximated using a Rayleigh-Ritz finite element 
method. Let be a triangulation of Q, which we assume to be quasi-uniform (cf. 
[3, Chapter 4]), and let and be appropriate finite-dimensional spaces. The 
interface is required to be the union of edges of the triangulation. If the interface 
is cutting through elements of the triangulation, then special techniques have to be 
considered in order to average the parameters properly, which complicates the whole 
approach. We do not address this task or the problems associated with it here, but 
instead assume that the interfaces are restricted to edges of the triangulation. For the 
sake of exposition, we also assume that each segment of the interface curves is parallel to 
one of the coordinate axes. It is easy to see that the following development of the finite 
element approach can be generalized to isoparametric elements, where the interface 
curves are logically aligned with coordinate axes. 

It is desirable, in general, to use conforming finite elements, where the finite¬ 
dimensional spaces satisfy C W and C V . Along straight segments of the 
interface curve, this can be accomplished by enforcing condition (18) on the finite ele¬ 
ment basis functions. Using bilinear finite elements on rectangles, for example, a basis 
function for ui at a node on a horizontal interface segment is continuous in the Xi- 
direction and has a jump of sRe a'^/a~ in the a; 2 -direction. Such a basis function for «i 
at a node on a vertical interface segment is continuous (in both coordinate directions). 
Under the assumption that all the interface curves are straight lines which do not in¬ 
tersect each other (we will address the case of interface corners oi cross-points later), 
we can therefore construct piecewise bilinear finite element spaces; 

= {q E V : q\T bilinear on T for all T G T^} 

= {v G ft) n R'(curl a, ft) : Vilr bilinear on T for all T G . 


The finite element approximation,(u^,p^) G x is then defined as the solution 
of the minimization problem 


G(u^p^/) 


min 


G(v",g";/). 


(23) 


One of the main practical advantages of the least-squares finite element approach over 
other variational formulations consists in the fact that the minimum of the functional 
constitutes an a posteriori error measure. This follows from the general relation between 
the least-squares functional and corresponding bilinear form. The main point here is 
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the fact that the least-squares functional is zero at the solution (u,p), which leads to 


G(u^p^/) 

= -Giu,p;f) 

= T{\i^,p^-,\i^,p^) -H 2(/, V • u'*)o,n - u, p) - 2(/, V • u)o,n 

=:: T^{\X —VL^ ,p — p^-,M— VL^ ,P — p’^) . 

Under the above assumptions, we get the following convergence result for the finite 
element approximation. 

Theorem 2 Assume that for (u,p), the solution of (10), we have (u,p)|ni G 
(^i+i5(f2^.))3 some 6 G (0,1] and for i = 1,...,/. Let (u*,p^) G x be 
the solution of (23). Then 


j 

|||(u,p) - (u^p'*)|i| < Gh^ Y. (IHIi+^n. + ||v^p||i+^,n.) (24) 

f=l 


where the constant C is independent of h and of the size of the jumps in {wj}. 

Proof. From Theorem 1 and Cea’s Lemma (see, for example, [3, Theorem 2.8.1]), 
we obtain 


ll(u,p)- 


(u",p'“) 




mm 

Tl (v'*, 5'“)6W'* xV"* 


(u,p)-(v^g^ 


Moreover, for (v, g) G W x U, we have 

lll(v, g)||P = ||V • v||2 n -f- ||aV X (v/a)||2_n + ||v/^^||g ^ + IIVaV?||2 „ 

j 

= Y (11^ ■ ^ (v/a)||o,n. + IIWallo.n. + IIVaVg||g^n,) 

< Cl ^ (l|v • v||2_n_ + ||v X v||g -f ||v/v^||2 n, -h ||v^Vg||g,n,) • 

i = l 

Since by assumption u|ni G and, similarly, G for each G W^, 

then for i = 1,. .., J we have 

||V ■ (u - v'*)||g -I- ||V X (u - v^)||^ n. < C 2 |u - v^\ln, ■ 


This leads to 

J 

|||(u,p)-(v^g")||| < C3^ (|u - v^li.n. + ||(u - v'‘)/^||o.a. + |lv^(p - g")||i,nO • 

i=l 

Standard interpolation properties of piecewise bilinear functions (see, for example, [3, 
Theorems 12.3.3 and 12.3.12]) lead to 

||u - v'*|li,n. < c4/i'®||u||i+i,n, 

IIp- 9'‘l|i,n, < C5h''|b||i+6,n, 


which completes the proof. | 

If the interface curve is not a straight line, or, more generally, not sufficiently smooth, 
then the finite element approximation becomes excessively more complicated. In the 
preceding section we saw that, for the solution (u,p) of (10), u has the singular behavior 
shown in (20) and (21). It is easy to see that this implies u|ni ^ (iJ^(Qi))^ for all 
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subregions f2i adjacent to the interface corner, and therefore the standard finite element 
approximation results do not apply. 

Moreover, in order to have G ff(div, f2) n iy(curla,fi) in the neighborhood of 
an interface corner, it is necessary and sufficient to require to have the form of 
(20) and (21). In other words, in order to have conforming finite elements, we must 
include a singular .basis function at each interface corner (or cross-point); The tools 
developed in the previous section allow us, in principle, to compute the exact shape of 
such a singularity. Multiplied by a standard piecewise bilinear function, such a singular 
function conld then serve as a basis function at that point. A procedure of this type 
is described in [14, Section 8.2] along with special techniqnes to solve the resulting 
discrete system. However, this approach requires special stencils for these singular 
points, which complicates the overall finite element approach. Instead, we consider an 
alternative nonconforming finite element method, based on simple basis functions like 
bilinears on rectangles. 

We construct observing the fact that, for the right-hand side in (11) to be 
defined, we must have C H{div,Q,). This implies that, for G W^, n • 
must be continuous across all interfaces. Now consider the bilinear finite element basis 
function associated with the interface corner in Figure 1. For G C H{div,Q), 
we must require that «i is continuous in the xi-direction across the horizontal portion 
of the interface; that U 2 is continuous in the S 2 -direction across the vertical portion 
of the interface; and that both wi,U 2 are continuous elsewhere. From (18) we see 
that u G Ff(curla,fl) requires «2 to have a jump across the vertical portion of the 
interface, while ui must have a jump across the horizontal portion. This causes a 
conflict at the corner. The finite-dimensional space will, therefore, not be contained 
in Ff(curla, Q), in general, and x 'W x V. In particular, the bilinear form 

T{-, •; •, •) is not defined on x . For u, v G W-|-and p, g G I^ +we define 
a modified least-squares bilinear form by 

:F'‘(u,p;v,g) = (u/^a - \/aVp, v/^a - V^Vg)o,n . . 

-f(V-u,V-v)o.n + EL(Vxu,Vxv)o,n. . ^ ^ 

On W X I^, this bilinear form coincides with T{-, •;■,). The least-squares functional 
corresponding to (•,•;•,•) is 

j 

G'‘(u,p;/) = ||u/Va-VaVpl|g_n+l|V-u-(-/||g n + X]l|V x u||g^n. • (26) 

1=1 


by 


Let (u,p) G W X be the solution of (10), and let G W'* x be defined 


G'‘(u^p^/) = min G'‘(v^g'‘;/) 


(27) 


Recall that, at an interface corner, u has a singularity of the form given in (20) and 
(21). This implies that we cannot expect to approximate u to the same accuracy by 
standard finite elements near a singularity as elsewhere in fi. Moreover, since our finite 
element subspace x is not contained in the space W x R in which we have shown 
ellipticity, the relatively large error near a singularity will deterioriate the finite element 
approximation in the entire region. This phenomenon is reflected by the fact that, 
in the presence of singularities, G^(u^,p^;/) does not decrease as h is made smaller. 
We will observe this behavior later in our computational experiments. It is therefore 
necessary to introduce a weight function which decreases near the singular point. The 
proper choice of weighting is motivated by the form of the singularity. 

In particular, (19), (20), and (21) imply Vu ~ in the neighborhood of the 
singularity. If denotes an element of the triangulation such that the interface 
corner appears as one of its vertices, then 


r2-“V(u,- - v!] 


Q Th — 0{h^) . 
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If the right-hand side / and the restriction of a to fi,- are sufficiently smooth, then we 
know that u G i.e., ii G {H-(Q)]- for any compact Q C fi,-. This implies 

that G exists such that 


The other terms in (26) can be treated in a similar way, which motivates the definition 
of the weighted least-squares functional 

GUu,p:f) = l|u/\/?- \AVp||5 

+11V ■ u + /||5,,,,_„,n + EL IIV X 


and corresponding bilinear form 

P^l{u.p:v,q) = {u/s/a - x/aVp),v/ya - v/aV(?))o,ft,i-o.n 
+{V ■ u, V • v)o,ft, 2 -Q,n + Ef=i('^ X u)- X v)o,ft. 2 -a,n, • 


The inner product (•, ■)o,h.j.n is defined as 

(v, w)o,ft,j.n = (»■ V,U' w)o,n 

with the weight function constructed in the following way: Consider a sequence of 
triangulations {T^‘ .1 = 0,... ,L}. with fiT = /iq > /ii > ■ ■ • > hi — h. Let fij' denote 
the union of of all elements T^' G T^' with the singular point as one of their vertices. 
The weight function is defined as 

( for X G 

U, (x) = i /if for X G ' , / = 1.L , (30) 

\ 1 for X G . 


Let (u^,Py,) G X be the solution of 

gI{^1,pIj) = 


min W,(v'*,9^;/) . 


(31) 


In the final section of this paper we will demonstrate, by means of numerical results, that 
the weighted functional G'^ (u^, p '^.; /) actually decreases regularly as the triangulation 
is refined. Note, however, that this does not mean that the error u — is small 
throughout the region fi. In particular, the pointwise accuracy usually deteriorates 
near singularities. This suggests that the weighted functional should be combined with 
local refinement techniques to guarantee satisfactory resolution in the entire region. 
Multilevel refinement techniques are especially effective in this context. 


Multilevel Algorithms 

Consider the sequence of triangulations {T^', / = 0,..., L} introduced earlier. As¬ 
sociated with each triangulation {T ^'} is the finite element space W^' x , which we 
may also denote by W; x V). This leads to a nested sequence of spaces 

Wo X Co C Wi X Cl C • • ■ C Wi, X 14 = W'* X I/'* . 

On each level I, 0 < I < L, an operator : W/ x C; —»■ W; x C; is defined by 
((,F/(u,p); (v, g))) = .f(u,p;v,g) for all (v,g) G W; x C , 
where the inner product ((■; •)) is given by 

((u, p)] (v, g))) = (u, v)o,n -b (y/a p, Va g)o,n . 
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In terms of the operator Ti, the discrete problem (23) can be written as 

Ti{uupi) = Fi (32) 

where the right-hand side is defined by {{F], (v, q))) — —(/, V-v)o,n for all (v, q) G W; x 
V]. For the solution of (32), it is natural to use an iterative method since this requires 
only a computational procedure for the action of the operator Ti for I = 

The cost for one call of such a procedure is proportional to the number of unknowns 

Ar = 0(h-2). 

The conjugate gradient method (cf. [13, Section 8.7]) computes its iterates 
(u["\p^"^) G W; X Vi in the Krylov subspace 

Kn{F\,Ti) = spanlF’;, FiFi, ..., T'^~^Fi) 

according to the minimization property 

G(u[”\p["^/) = min G{wuqrJ) ■ 

Since the condition number of Fi is proportional to 0{hj^) (cf. [5, Theorem 3.2]), the 
number of conjugate gradient iterations required to achieve a certain accuracy grows like 
0{hf^) (cf. [13, Section 8.7]). The overall computational complexity to solve a discrete 
problem on using the conjugate gradient method therefore grows like 0{hf^). 

Optimal computational complexity, can be achieved, under certain assump¬ 

tions on T{{-, •);(•, •)), by a full multigrid algorithm. The basic ingredients for multilevel 
methods are the projection operators VuQi : W'‘ x -* W; x Vi which are given by 

F{Vi{u,py,{v,q)) = F{{u,py,{-v,q)) for all (v,^) G W; x Vi 


and 

((Q((u.P);(v,?))) = (((u,p);(v,g))) for all {v,q) G W, x Vj 

and smoothing operators 77; : W( x V; —»■ W; x Vi representing iterations on level /. 
With these tools, standard multilevel algorithms can be constructed (see [5, Section 4] 
for further details). A detailed study of the convergence properties of multilevel methods 
for first-order system least-squares applied to problems with discontinuous coefficients 
will be given in [9]. 


Computational Experiments 

In our examples, we consider (3) on the unit square f2 = {{xi,X 2 ) G : 0 < 
xi,X 2 < 1}, with / = 1 and F^) = 5ft. We show the results of two sets of experiments, 
one with a smooth interface curve and the other with an interface corner causing a 
singularity in u. 

Example 1. In this example, the interface curve is a straight line, so no singularity 
occurs. We consider 

u(.r 1, X2 

with different choices for the values for a+ and a~. The solution shown in Figure 3 was 
obtained for a+ = 10 and a~ = 0.1. 

The computational results shown in Table 1 indicate that the approximation of the 
solution improves nicely as the triangulation is refined, independently of the size of the 
jumps. The reduction factor displayed in parentheses is the ratio of the minimum values 
on the current and next coarser level. Note that they do not quite reach 0.25, which is 
due to the lack of regularity at the corners of the subdomains. In fact, due to the corners 


. _ r a+ , 0 < X 2 < 0.5 , 
~ \ a~ , 0.5 < X 2 < 1 


(33) 
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Table 1: Example 1: Minimum value (reduction factor) of the functional 


a'^/a 


1 



10 



102 



10'* 


h 

1/8 

2.42 

• 10-2 


3.50 

• 10-2 


4.13 

■ 10-2 


7.81 

• 10-2 


1/16 

7.18 • 

10-3 

(0.30) 

1.07- 

10-2 

(0.31) 

1.26 • 

10-2 

(0.31) 

2.30- 

10-2 

(0.29) 

1/32 

2.08- 

10-3 

(0.29) 

3.14- 

10-3 

(0.29) 

3.71 • 

10-3 

(0.29) 

6.41 • 

10-3 

(0.28) 

1/64 

5.92- 

10-^ 

(0.28) 

9.05 ■ 

10-4 

(0.29) 

1.07- 

10-3 

(0.29) 

1.75- 

10-3 

(0.27) 


with interior angle 7r/2, we have neither u G nor u G . Conse¬ 

quently, the finite element approximation deteriorates near these corners. In contrast to 
the situation at singularities, however, this behavior does not contaminate the solution 
elsewhere since the basis functions corresponding to these points are conforming. 

Example 2. This example shows results for a problem with a singularity in u. We 
choose 


f a+, 0 < .-Ci, a;2 < 0.5 , 

( a ~, elsewhere 


(34) 


(see Figure 1) with different choices for the values for a+ and a~ (again with a+ = 10 
and =0.1 for the solution shown in Figure 4). 

The exponents for this example with the three values for the coefficient jumps used 
in Table 2 are given by a = 0.7317, 0.6739, and 0.6667, respectively. Note that the last 
number is very close to the value a = 2/3 that one gets for a reentrant corner with 
interior angle 3/27r. Using the weighting described earlier with H = 1/8 leads to the 
results listed in Table 2. The modified least-squares functional is again reduced nicely 
and regularly as the triangulation is refined. Note that using the weighted functional 
means that the pointwise approximation deteriorates close to the singular point, where 
local refinement can be used if a better pointwise resolution is needed. 


Table 2: Example 2: Minimum value (reduction factor) of the weighted functional G 


a+/a 


1 


10 

102 

104 

h 

1/8 

2.42 

• 10-2 

3.74 

• 10-2 

5.17-10-2 

1.20 - 10-* 

1/16 

7.18 ■ 

10-3 (0.30) 

1.16 ■ 

10-2 (0.31) 

1.58 • 10-2 (0.31) 

3.53 -10-2 (0.29) 

1/32 

2.08- 

10-3 (0.29) 

3.43- 

10-3 (0.30) 

4.66 ■ 10-3 (0.29) 

9.84-10-3 (0.28) 

1/64 

5.92 ■ 

10-“* (0.28) 

9.95 • 

10-4 (0.29) 

1.34-10-3 (0.29) 

2.68 -10-3 (0.27) 


Table 3: 

Example 2: 

Minimum value of the functional G^ 

la~ 

1 

10 

102 

104 

h 

1/8 

1/16 

1/32 

1/64 

2.42-10-2 
7.18 - 10-3 
2.08 - 10-3 
5.92-10-4 

4.36 -10-2 
2.39 - 10-2 
2.07-10-2 
2.22-10-2 

7.50-10-2 
5.49 - 10-2 
5.35-10-2 
5.66-10-2 

1.62-10-4 
9.89 -10-2 
8.86-10-2 
9.33-10-2 


In order to illustrate the necessity of modifying the functional in the neighborhood 
of a singular point, we also computed the results for the unmodified functional 
instead of G(j. The numbers in Table 3 show that this functional is not satisfactorily 
reduced in the course of refining the triangulation. Our numerical tests have shown 
that minimizing the unmodified functional leads to poor finite element approximations. 
Figure 2 shows the error with respect to the exact solution for p for the weighted 
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functional and for the unmodified functional. Obviously, for the unmodified functional, 
the resulting error between the discrete and exact solution is relatively large in the entire 
domain. This behavior seems to indicate that using the unmodified functional has the 
effect of trying too hard to satisfy the first-order system (6) close to the singularity, 
where it is impossible to get a good approximation with bilinear finite elements. For 
the weighted functional, however, the error is smaller and mainly occurs in a rather 
small neighborhood of the singular point. 


0 . 015 ^ 




Figure 2: Example 2: Error in the pressure p for the weighted functional (top) and the 
unmodified functional (bottom.) 
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Figure 4: Example 2: Pressure p (top) and flux components ui and U 2 
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ABSTRACT 

Multigrid methods have proven to be efficient methods for solving partial differential 
equations (especially those of elliptic type). There is also growing experience with 
multigrid solvers for fluids problems, e.g., the Stokes and Navier-Stokes equations (using 
both finite element and finite difference discretizations). 

It is also well known that at the heart of any multigrid method is the smoother. In 
this work we look at a smoother introduced by Brandt and Dinar (DCS relaxation), 
and we examine some of its properties and consider some possible modifications to it. 
It is well known that multigrid performance using DCS relaxation is sensitive to the 
treatment of boundaries; this issue is addressed. 


INTRODUCTION 


Multigrid methods have proven to be efficient methods for solving partial differential 
equations (especially those of elliptic type). There is also growing experience with 
multigrid solvers for fluids problems, e.g., the Stokes and Navier-Stokes equations (using 
both finite element and finite difference discretizations. (See, e.g., [1]-[13] and the 
references therein.) 

It is also well known that at the heart of any multigrid method is the smoother. In 
this work we look at a smoother (DCS relaxation; distributed Gauss-Seidel relaxation) 
introduced in [2] and [3], as it applies to the Stokes problem. We examine some of its 
properties and consider some possible modiflcations to it. 

We consider the well-known Stokes equations; these equations, which model flows 
with small velocities (creeping flows), may be viewed as a linear version of the Navier- 
Stokes equations (which describe the flow of an incompressible, viscous fluid). The 

*This work was supported in part by a contract from American Computing, Inc. 



following analysis extends to the (nonlinear) Navier-Stokes equations and is the subject 
of a forthcoming paper. 


The Stokes equations in D. are, where is a bounded domain in (we assume 
the domain is three-dimensional; obviously, the following results hold equally well for 
two-dimensional domains), 


' 

- Au -1- Vp = f 

(1) 

and 


V • u = 0. 

(2) 

On dH (the boundary of Q). 


u an = g. 

(3) 


Here u and p are the velocity and pressure, respectively (the unknowns). Given are the 
body force f and the boundary condition g. 

There exists a large body of work which deals with the analysis and the development 
of various approximation methods of solutions for this system of equations. (See, e.g., 
[14]-[17] and the references cited therein.) Here we propose yet another such method 
which is based on a reformulation of the equations (suggested by DGS relaxation). 

Remark 1. It is well known (see [15] and [16]) that given f G and 

g G with /qq g • n ds = 0 the Stokes equations (1 )-(3) have a unique solution 

{u.p) 6 H'(Qf X msi). 

Throughout the paper we assume that Q is a bounded, simply-connected domain in 
which is of class or is a convex polyhedron. (See [16] or [18].) The boundary of 
the domain is denoted dQ and n is the unit outward-pointing normal vector to 0. Here 
and in the sequel (s a positive integer) is the usual jL^(fl)-based Sobolev space, 

is the trace space of and is its dual. (See [18].) Also, 

Lo(f2) = |p G 1?{Q) : J^pdx = 0^ 

(i.e., it is the subspace of L^-functions which have zero mean; see [16] and [17]). We 
also introduce the following subspaces of and (see [19] and [16]): 

:= {t G : t • n = O} 

and 

G ^ • n]an = o} . 

On (the space of functions with zero trace on the boundary) and on 

(l!Vx(.)K + l|v.(.)ll?)''' 

is a norm equivalent to the i7^-norm (due to the existence of a Poincare-type inequality 
for domains such as those discussed above; see, e.g., [16]). Here |1 • |]s denotes the i7®- 
norm (s = 0 for L^). 
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The Stokes equations can be formally written as the system 


u' 


- -A 

V 

u' 


f- 

p. 


.-V- 

0 . 

.P. 


.0. 


u|an = g. 


DGS relaxation may be viewed as Gauss-Seidel relaxation on a right preconditioned 
system or Gauss-Seidel relaxation on .an equation with transformed variables. The 
change of variables (up to a sign change) as described in [2] and [3] (also in [13]) is 


given as 


LM 


u' 


■ -A 

0 ■ 

u' 


f 

P. 


.-V- 

-A. 

-P. 


.0. 


It is easily seen that the (so called) distribution matrix M (the right preconditioner) is 


M = 


I V 
0 A. 


Formally, , the inverse change of variables, is given by 



So the change of variables is given by 


u' 


■/ 

V 


u' 


u 


I 

-VA-i- 


u' 

-P. 

— 

.0 

A. 


-P. 

or 

-P. 


.0 

A-' 


-P. 


Thus we end up with the equations 

-Au = f 


and 

—Ap = V • u. 

An obvious obstacle in this approach is the lack of boundary conditions on u = u - 
VA“V = u - Vp and on p = A”^p. Obviously we cannot specify Vp on the boundary 
(one would like to do that since u|an = g is given), since this would result in an 
overdetermined system for p. Note that even if a boundary condition for p were derived 
and we were to derive a boundary condition for u, this boundary condition for u would 
involve p (namely, Vp). Thus we would end up with a system of equations that are 
coupled through the boundary conditions. (See [4].) 

Thus it is proposed (in [2] and [3]) that this system be solved iteratively (with no 
mention of the boundary conditions to be used); that is, we perform a Gauss-Seidel 
step on the transformed system and then perform the inverse change of variables. In 
practice we only work with the original variables (the new variables are introduced 
only to describe the method). In fact, some ad hoc modifications to the method are 
proposed in [13]; these improve the method in the presence of boundaries. 
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An obvious question is whether other changes of variables may yield a similar iter¬ 
ation scheme. (See [5].) The most obvious change of variables that comes to mind will 
avoid forming the Laplacian and inverse Laplacian in the equation for the pressure; it 
will therefore be given by the following distribution matrix: 



I A-^V 
0 I 


Formally, the inverse of the distribution matrix is 


= 


I 

.0 


-A-^V 

/ 


Now 


LM = 


■ -A 
-V- 



so the change of variables is given by 


u' 


[/ 

A-^V] 


u' 

p. 


.0 

I 


.P. 


u' 


I 

-A-^V] 


u' 

p. 


.0 

I J 


.P. 


This change of variables will yield a relaxation method which we call MDGS (modified 
DGS) relaxation. 

Thus we end up with the equations 


—Au = f 

and 

—p = V • u. 

An obvious advantage of this method is that there are no additional boundary 
conditions which must be imposed (or, more precisely, we may impose the boundary 
condition u|an = g, and no boundary condition is needed for p). A drawback of the 
method is that it is more complicated (since the change of variables now involves an 
inverse Laplacian, although this can be approximated locally). This alternative is very 
similar to an iteration for Uzawa’s method; see [6], [20], and [21]. (See also [14] and 
[16].) 

We abandon, for the time being, any further discussion of DGS (and MDGS) relax¬ 
ation and consider a related alternate formulation of the Stokes problem. 


ALTERNATE FORMULATION 


We consider the following formulation for the Stokes problem: 

—Av = f, (4) 
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<1 

II 

1 

<1 

(5) 

and 

V X # = 0 . 

(6) 

With boundary conditions 

v\dn + #|an = g 

(7) 

and 

# • n|an = 0. 

(8) 

An alternate formulation with boundary condition # x n|an = 0 (instead of (8)) may 
be treated as well; details will appear in a forthcoming paper. 

This formulation is equivalent to the Stokes equations when we set the velocity 

U = V + $ 

(9) 

and the pressure 

p = V • #. 

(10) 

Note that if (8) is satisfied then f^V ■ ^ dx = 0, and we may in 
p = -V • V. 

fact (due to (5)) set 

Since # satisfies 

V • # = —V • V , 

(11) 

V X # = 0 , 

(12) 

and 

# • n|sn = 0, 

(13) 

there exists (j) such that # = V0; moreover, (j) is characterized as 

the solution of 

1 

I> 

II 

1 

<1 

II 

<] 

<! 

(14) 

and 

V(f> ■ n|an = 0 . 

(15) 

Because # = V</), the fact that (j) (the solution of (14) and (15)) 
an additive constant does not cause any difficulties. 

is unique only up to 

In light of the above, one may replace (4)-(8) by 


—Av = f, 

(16) 

1 

t> 

II 

<1 

(17) 

vjan + V^laa — g > 

(18) 

and 

V(j)-n\Qa = 0. 

(19) 


The relationship of this formulation to DGS and MDGS is now patently clear if we 
identify 

u = u = V and p = Ap = —V • v = V • # = A(f). 

5 9 SB 
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The advantage of this point of view is the availability of boundary conditions for 
the various unknowns. A difficulty in this approach is the fact that the equations are 
coupled through the boundary conditions; this situation is unavoidable, however (as 
observed earlier). We also have the following theorem: 

Theorem 2. The formulation (l)-(3) is equivalent to the formulation (4)-(8) and to 
the formulation (16)-(19). ‘ 


Proof: If (u,p) e x Ll{Q) is a solution of (l)-(3) then let # be the unique 

solution of 

V • <& = p, 

V X # = 0 , 

and 

$ • n|an = 0 . 

Note that V-u = 0 and A# = VV-# (due to the fact that — A# = Vx Vx# —VV-$ 
and V X $ = 0); thus, A# = Vp. Setting 

V = u — # , 

it is easily seen that (v, #) satisfies (4)-(8). Conversely, if (v, #) satisfies (4)-(8), then 
set 

u = V + # 

and 

p = V • # . 

Recall A$ = Vp; clearly (u,p) satisfies (l)-(3). 

It is well known that (5)“-(6) and (8) are equivalent to (14) and (15), with the 
identification # = Vcf). (See, e.g., [16].) To complete the proof we observe the following: 
if (u, p) satisfies the Stokes equations and if we set 

-A0 = -p. 


• n 


an 


0 . 


and 


V = u — V(j ), 

then (v, ^) so defined satisfies equations (16)-(19). Conversely, if (v,0) satisfies equa¬ 
tions (16)-(19), set 

U =: V 4- V0 

and 

P = A(j), 

then (u,p) satisfies the Stokes problem (equations (l)-(3)). □ 
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WEAK FORMULATION 


Consider the following weak formulation: find v, s, and # such that 
V e with V • n|an = g • n, s G , # G , (20) 

f {V X V • V X w + V • vV - w + Vx^-Vx^ + V- #V • dx 
Ja ■ ( 21 ) 

+ f V • vV • ^ da: + (s, w)an = (f, w)^ Vw G ^ G , 

J O 

and 

(t, V + #)an = (t, g)0n Vt g . (22) 

Here (•,-)n and {■,-)dn denote the duality pairing of and and of 

and , respectively. Or equivalently, consider the following weak 

formulation: find v, s, and (j) such that 


V G with V • n|an = g • n , s G , 

(j)eH\Q) with V4>eHl{nf, 


(23) 


X V • V X w + V • vV • w + A^A'0} da: + j ‘ 'vAip dx + (s, w)an 
= (f,w)n \/w e Hlinf ,7P e H^{^) with 


(24) 


and 


(t, V + V(j))da = (t, g)an Vt G ^ ^^5) 

Theorem 3. Equations (20)-(22) and (23)-(25) are weak formulations for (4)-(8) 
and (16)-(19), respectively. 


Proof: Setting ^ = 0 and restricting w G ido(^)^ i’a (21) we get that 



X V • V X w + V • vV • w} dx = (f, w)n , 


which implies that 

—Av = f 

in letting w be an arbitrary element of we get that 

s = -V X V X n| 9 n 

in . Now setting w = 0 and setting ^ to be the solution of 

V-^ = V# + Vv, 

V X ^ = 0, 
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and 


^ • n|an = 0, 


we get that 

V • # = —V • V 

in Letting '4^ be an arbitrary element of we get that 

V X # = 0 

in L^(0)^. Finally from (20) and (22) we obtain (7). The proof for the formulation 
(23)-(25) proceeds similarly. □ 

For notational convenience, define 

A{{v, #), (w, ^)) := /{Vxv-Vxw + V- vV • w + V x # • V x dx 

+ / V • # V • ^ da; + f V • vV • dx , 

Jn Jn 

B{s,{w,^)) := (s,w)an, 
i:>(t, (v, #)) := (t, V + #)an , 

F((w,^)) ;= (f,w)n, 

G'(t) := (t,g)5n, 

and 

a((v, (/)), (w, ^p)) := [ {V X V • V x w + V • vV • w + AcpA'ip} dx + / V • vAip dx , 
Jn Jn 

b{s, (w,-!/>)) := (s,w)an , 
d{t,{v,(l))) := (t,v +V(/))an, 

/((w,d’)) := (f,w)n, 
d(t) := (t,g)an- 

We denote 

n :=H\nf X Hlinf 

and 

■H„ := Hk(Qf X Hm\ 

On these spaces we use the usual product norm. 

With this notation we may write the weak formulations as follows; find v, s, and 
# such that 

V e with v-n|an = g-n, s e ^ e , (26) 

A((v, #), (w, ^)) + 5(s, (w, ^)) = F((w, ^)) V(w,’^)G'H„, (27) 
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and 


(28) 


B(t,(v,#)) = G(t) vt e . 

Equivalently, find v, s, and (f) such that 

V G with V • n|an = g • », s G , 

with V(f>eHl{n)\ 

a((v, (p), (w, ^p)) + b{s, (w, -0)) = /((w, 'ip)) 

Vw G , 'ip G with G Hl{Qf , 

and 

(v, 0 )) = ^(t) Vt G . 

Note that this weak formulation falls into the class of generalized saddle point 
problems of the type considered in [22]. (See also [14] and [23].) 

Lemma 4. The forms A{-, •), B{-, •), D{-, ■), F{-), and G{-) are continuous; that is, 
positive constants Xa, Xb, Xd, Xp, and Xq exist such that 


|H((v,<$),(w,'^))| < Aa]|(v,#)1|.h||(w,^)1|?^, 

(32) 

|H(s, (w, ^))| < AB|Js]J_i/ 2 iJ(w, #)JJ^ , 

(33) 

111(1, (V,#))| < Afl||t||_i/2||(V,#)]|9^, 

(34) 

|E((w, ’^))J < Af]1(w,^)1Jh, 

(35) 

|(^(t)| < AG|ltlJ_i/2 . 

(36) 


(29) 

(30) 

(31) 


Proofs The proof is an easy consequence of Holder’s inequality and the definition of 
the forms. □ 

Define 


}Cb := {(w, At)enn: B{s, (w, ^)) = 0 Vs G , 


and 

Kd := {(v, #) G -Hn : D{t, (v, #)) = 0 Vt G . 


Lemma 5. The forms B{-,■), and D{-,-) satisfy some inf-sup conditions; in 

particular, positive constants a, (3, and 5 exist such that 


inf sup 

(w,«')ex;B (v,#)eK:0 


A((v, #), (w, ^)) 

ll(v, #)|]?^||(w,^)l|^ 


> a. 


(37) 


inf sup A((v, #), (w, ^)) > 0, (38) 

(v,#)gk:d\{o} 
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and 


seHn 


^(s,(w,#)) 
*/-(an) ||sl|-i/2ll(w, 


inf 


sup 


>P: 


(39) 


inf 


sup 


teH-^^-{dn){v,^)€nn 


^(t,(v,#)) 

|t||_i/2l|(v,#)lj^ 


> S. 


(40) 


Proof: The first condition (inequality (37)) follows from the observations that given 
(w, ’Jf) e /Cb, setting v = w — '4^ and ^ guarantees that (v, <&) G K,d, that a 
positive constant c exists so that ||(v,#)||b < c||(w,^f)||-B, and that 




Given (v, $) G JCd \ {0}. set w = v + # and ^ then, (w, ^) G /C^; moreover, 
it is easily seen that 

.4((v, #), (w,«)) > t (||V X v|IS + ||V X *112) + ||V ■ (V + «)||2 . 

Now if |(|1V X v||o + llV X $llo) + 1|V • (v + $)l|o > 0, then (38) holds. If this is not 
the case (i.e., if ^(|iV x v||o + ||V x #||o) + ||V • (v + $)i|o = 0), it easily follows that 
v + $ = 0, and, because (v, #) ^ 0, then V • v 7 ^ 0. In this case we know (see [16]) 
that a w G Hq{Q) exists with V • w = V • v; setting = 0, we get that 

A((v,#), (w,^)) > ||V-v||2 

and conclude that the second condition holds. 

The third and fourth conditions (inequalities (39) and (40)) may be shown using 
the methods used in [24] to prove a similar inf-sup condition. □ 

Theorem 6. The weak problem (26)-(28) has a unique solution. 


Proof: This is a result of Lemma 4, Lemma 5, and the abstract theory detailed in [22] 
and [23]. □ 

It is an easy exercise to state, for (29)-(31), results analogous to those stated in 
Lemma 4, Lemma 5, and Theorem 6 . (Details will be given in a forthcoming paper.) 


DISCUSSION 


We point out that since the weak form of the problem falls into the class of gener¬ 
alized saddle point problems introduced in [22] (see also [14] and [23]), one may carry 
out finite element analysis for this problem in that framework. Such analysis yields 
existence and uniqueness results for the discrete problem (approximate problem) and 
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optimal error estimates for finite element approximation schemes based on these weak 
forms, provided that certain (discrete) inf-sup conditions hold. (Details will be given 
in a forthcoming paper.) 

An advantage of this formulation over the primitive variable (velocity-pressure) 
formulation of the problem is the fact that it is relatively easy to construct finite 
element spaces which satisfy the necessary inf-sup conditions. In fact there is complete 
freedom in choosing the spaces for v and for ^ (the spaces that approximate (0)^ and 
if^(r2)^); in view of the error estimates it is reasonable to choose the same finite element 
space for both of these. Once these spaces have been chosen, we choose the'space for s 
(the space approximating ) as the restriction to the boundary of elements of 

the previous spaces (i.e., the trace space of the discrete spaces approximating i7^(D)^). 
This choice for the discrete spaces guarantees that the necessary (discrete) inf-sup 
conditions are satisfied. Details and examples from computations will appear in a 
forthcoming paper. 

Another question to be investigated is the implications for multigrid codes employing 
DGS relaxation. Can these results be used in order to construct better smoothers 
(particularly in the neighborhood of boundaries)? As stated earlier, the relationship 
between this formulation and DGS relaxation is 

u = V and Ap = —V • v = V • # , 

but we also have that 

u = u -f Vp = v -1- #. 

Therefore it seems that when using DGS relaxation one alternative is to impose a 
homogeneous Neumann boundary condition on p (when solving — Ap = V • u) and the 
nonhomogeneous Dirichlet boundary condition g—Vp|an on u (when solving — Au = f). 

Moreover it may prove advantageous to keep explicit track of u and p on the bound¬ 
ary and use their values in the iteration. This may yield better behavior of DGS 
relaxation in the presence of boundaries. 

DGS relaxation (the change of variables described in [2] and [3]) is introduced in 
order to transform a saddle point problem into a problem which is definite. The fact 
that the new problem is still indefinite (a saddle point problem) is masked by the 
fact that the effects of the boundaries and boundary conditions have been neglected. 
Based on the previous analysis it is obvious that we are still faced with an indefinite 
problem. This must be taken into account when using this iterative scheme; one possible 
implication is that it may be advantageous to use an inexact Uzawa-type iteration to 
solve the problem. 
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SUMMARY 


A numerical scheme to solve the unsteady Navier-Stokes equations is described. The scheme is 
fully implicit in time and is unconditionally stable (at least for first- and second-order discretizations 
of the physical time derivatives). With unconditional stability, the choice of the time step is based on 
the physical phenomena to be resolved rather than limited by numerical stability. This is especially 
important for high Reynolds number viscous flows, where the spatial variation of grid cell size can 
be as much as six orders of magnitude. 

A multigrid-multiblock, steady-state, three-dimensional Navier-Stokes solver, TLNS3D, was mod¬ 
ified to iteratively invert the equations at each physical time step. The implementation of this procedure 
in TLNS3D is discussed. The implications of applying several popular turbulence models to unsteady 
flow are also considered. Numerical results are presented to show the application of the scheme to 
various two-dimensional turbulent flows. The results of a three-dimensional laminar flow calculation 
are also given. 


INTRODUCTION 


Although significant progress has been made in the last twenty years to numerically model many 
physical situations, most numerical schemes are limited to the prediction of steady flows. This 
limitation is particularly true in the field of computational fluid dynamics (CFD), where solutions to 
the Navier-Stokes equations for steady flows are now calculated on a regular basis. (See, for example, 
references [1-3].) An important factor that has lead to the increased use of Navier-Stokes solvers is the 
recent success in reducing the computer resources necessary to obtain converged solutions. Perhaps 
the most promising work has been in the use of multigrid acceleration techniques. Convergence to 
steady state has been shown in 0[log(«)] work, where n represents the number of unknowns to be 
solved. This reduction in computer requirements has made steady-state solutions affordable to the 
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practicing engineer. 

However, many physical phenomena (e.g., separated flows, wake flows, buffet) are intrinsically 
unsteady. The solution of unsteady problems in CFD has been limited to simplified subsets of the 
Navier-Stokes equations (panel methods, potential-flow solvers, and some limited use of Euler equation 
solvers). Unsteady Navier-Stokes calculations have been too expensive for routine use. 

The present approach is to apply an iterative procedure for the solution of an implicit equation; 
thus, the approach is called an iterative-implicit method. The concept is not new; in fact, many of 
the methods developed in the field of linear algebra for inverting large matrices are .iterative. Within 
the field of CFD, similar work is discussed by Jameson [4] for unsteady flows and by Taylor, Ng, 
and Walters [5] for steady-state flows. The present approach is similar to that of Jameson in that 
a Runge-Kutta-based multigrid method is used to solve the implicit unsteady flow equations. The 
Navier-Stokes equations have been treated in the present work, and Jameson’s implementation has 
been modified so that the robustness of the scheme is dramatically increased. Later work by Belov, 
Martinelli, and Jameson [6] has incorporated the modifications used in the present work as given 
below and in reference [7]. 

A summary description of the implementation is given below. (Details of the implementation and 
analysis of the method are given in a previous paper [7].) A discussion of the use of current ‘steady’ 
turbulence models is then given. Numerical results from laminar and turbulent two-dimensional test 
problems are then presented, as well as the results from a three-dimensional laminar calculation. 


GOVERNING EQUATIONS 


In the present work, a modified version of the thin-layer Navier-Stokes (TENS) equations is used 
to model the flow. The equation set is obtained from the complete Reynolds-averaged Navier-Stokes 
equations by retaining only the viscous diffusion terms normal to the solid surfaces. For a body-fitted 
coordinate system (^, t], () fixed in time, these equations can be written in the conservation-law form as 


5 . 1 . OF dG dH dF, dG, dHy 

dT^ > di dr] ^ dC di dr) dC' ^ ^ 

where U represents the conserved variable vector and F, G, and H represent the convective flux 
vectors. In the above equation set F„, Gy, and Hy represent the viscous flux vectors in the three 
coordinate directions (^, rj, (), and J is the Jacobian of the transformation. These equations represent 
a more general form of the classical thin-layer equations introduced in reference number [8] because 
the diffusion terms in all three coordinate directions are included in this form. The Euler equations can 
easily be recovered from equation (1) by simply dropping the last three terms on the right-hand side. 
The effects of turbulence are modeled through an eddy-viscosity hypothesis. The Baldwin-Lomax 
[8J, Spalart-Allmaras [9], and Menter shear-stress transport [10] turbulence models are currently 
implemented to provide turbulence closure. 
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The temporal derivatives are cast as a fully implicit operator in physical time. For first- or 
second-order discretizations in time, this produces an unconditionally stable scheme, which allows 
the time-step size to be chosen based on the temporal resolution needed in the solution rather than 
limited by the numerical stability requirements. The fully implicit terms are iteratively solved with 
multigrid acceleration rather than direct inversion, which would be too costly for the nonlinear three- 
dimensional Navier-Stokes equations. 


IMPLEMENTATION OF TIME-DEPENDENT METHOD 


Original TLNS3D Method 


In the original TLNS3D program, a semidiscrete cell-centered finite-volume algorithm, based on 
a Runge-Kutta time-stepping scheme [1, 11, 12], is used to obtain the steady-state solutions to the 
TENS equations. A linear fourth-difference-based and nonlinear second-difference-based artificial 
dissipation is added to suppress both the odd-even decoupling and the oscillations in the vicinity of 
shock waves and stagnation points, respectively. Both the scalar and matrix forms of the artificial 
dissipation models [13] are incorporated. 

In the steady-state implementation, the physical time T is replaced by a pseudo time r, which gives 


^ / r-irA _ dF dG dH OF, dG, dH, 

dr^ ^ drj d( drj 

At steady state, the left-hand side of equation (2) disappears, and the right-hand side (the residual) 
goes to zero, so that any stable scheme may be used to advance the solution in pseudo time. 

In the original TLNS3D program, the solution is advanced with a five-stage Runge-Kutta time¬ 
stepping scheme. Three evaluations of the artificial dissipation terms (computed at the odd stages) 
are used to obtain a larger parabolic stability bound, which allows a higher. CFL number in the 
presence of physical viscous diffusion terms. Such a scheme is computationally efficient for solving 
both the steady Navier-Stokes and the steady Euler equations. The stability range of the numerical 
scheme is further increased with the use of an implicit residual smoothing technique that employs grid 
aspect-ratio-dependent coefficients [1, 14, 15]. 

The solution is advanced in pseudo time with the maximum allowable time step for each cell. 
The efficiency of the steady numerical scheme is also significantly enhanced through the use of 
a multigrid acceleration technique as described in reference [1]. The original TLNS3D program 
was extensively modified to facilitate solution of the flow fields over a wide range of geometric 
configurations through domain decomposition. This multiblock version of TLNS3D is referred to as 
TLNS3D-MB. A consequence of this work is the generalization of the boundary conditions of the 
program to easily accommodate any arbitrary grid topology. A detailed description of this capability 
is given in reference [16]. 
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Time-dependent TLNS3D-MB 


In the steady-state version of TLNS3D-MB, the following multistage Runge-Kutta scheme is used 
to solve (2): 


p^(0) _ 


j^(k) ^ ^(0) akArJ-^ 






( 3 ) 




where W is the solution vector for the discrete formulation, m is the counter for the Runge-Kutta 
iterations, {k) is the of K Runge-Kutta stages, is the coefficient for the k*^ Runge-Kutta stage, 
C is the convective operator (evaluated at the previous Runge-Kutta stage). Dp and Da are the physical 
and artificial dissipation operators (evaluated at a linear combination of previous Runge-Kutta stages), 
and F is the multigrid forcing function. The above solution procedure can be thought of as placing 
the equation to be solved (in this case the steady-state Navier-Stokes equations) on the right-hand side 
of the equation and adding a pseudo-time term on the left-hand side. (See equation (2).) The same 
type of procedure is used in the time-accurate version of TLNS3D-MB. In this case, however, the 
unsteady Navier-Stokes equations are placed on the right-hand side: 


d 

dr 


d 




dF dG dH dFy dGy dHy 


5 ^ 57 / dC 


di dr] dC 


( 4 ) 


The physical time derivative is then approximated as a finite difference, and the same type of Runge- 
Kutta scheme is used to advance the solution in pseudo-time: 

pi^(O) _ 


akArJ' 


_ ty(0)-|- 

-1 |C'(^'-i)(W) - Di’'\w) - Di^\w) + F^^-'^\W) 


I yy{k) _ y[rn 


J-i 


At 


( 5 ) 


yyrn+l _ yy{K)^ 

where n is the physical time step counter. Note that for simplicity the physical time derivative has 
been written as a first-order derivative; a higher order discretization can be used if more accuracy is 
desired. Also note that all terms in (5), except for the second term in the physical time derivative, 
are evaluated at the new physical time level n I. 
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Equation (5) cannot be solved directly because the term appears on both sides of the 

equation. Solving (5) for gives 


{l+akX)W^^'^ = 
+ akArJ-^ 


- D^^\w) - Di^\w) + + 


1 W^' 
J-i At 


( 6 ) 


where A is the ratio of pseudo and physical times However, (6) also is unacceptable because 
the right-hand side does not go to zero as the Runge-Kutta iteration converges. The final form for 
the Runge-Kutta stage for the time-dependent version of TLNS3D-MB is obtained by adding and 
subtracting the term to the right-hand side of the equation: 




a^ArJ 


-1 






1 pp-l*:-!) _ 


J 


-1 


At 


( 7 ) 


For second-order discretization of the physical time derivative, this becomes: 


1 + -b akXW^^-'^^+ 


ajfArJ 


-1 






1 -AW^ + W^-'^ 

_ _ 


( 8 ) 


The Baldwin-Lomax turbulence model is considered a zero-equation turbulence model and is 
implemented as part of the solution of the Navier-Stokes equations. The one- and two-equation 
turbulence models are implemented such that their solution is decoupled from the Navier-Stokes 
equations. They do not contain physical time derivatives and are not treated in a time-accurate manner. 
From a heuristic standpoint, they can be considered frozen in time. The results presented below indicate 
that this is an acceptable implementation for the class of problems considered. Subsequent work [17] 
has indicated that the physical time derivatives should be included in the turbulence model to insure 
accuracy for a wide range of flows. 


NUMERICAL RESULTS 


To demonstrate the capability of the present method, the results of several numerical experiments 
are given. The first case that is examined is the unsteady flow over a two-dimensional circular cylinder 
with a Reynolds number of 3000 and a free-stream Mach number of 0.2. If the flow about the cylinder 
is impulsively started, the initial flow is symmetric with zero lift as the wake behind the cylinder begins 
to grow. As the wake continues to grow, it becomes unstable and begins to shed from alternate sides 
of the cylinder. This shedding is periodic in nature and is characterized by the Strouhal number. The 
experimentally obtained value of the Strouhal number for the above conditions is 0.21. 
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The present scheme was used to calculate the fully developed vortex shedding flow around the 
cylinder. Two ditferent grids were generated for the calculations; a fine grid with 257 a; 129 points 
around and normal to the cylinder, respectively, and a coarse grid generated by deleting every other 
point from the fine grid. The fine grid was generated using an algebraic method with simple power 
law stretching. The normal spacing at the cylinder for the fine grid was 0.0001 times the diameter of 
the cylinder, and the grid extended to 20 diameters from the center of the cylinder. The coarse grid 
had a normal spacing of 0.0002 with the same outer boundary. Points were clustered in the wake 
region for better resolution, as shown for the coarse grid in figure 1. Results were obtained for two 
time step sizes for both the Baldwin-Lomax and Spalart-Allmaras turbulence models. Second-order 
discretization of physical time was used for all unsteady calculations. The larger, nondimensional 
time step size of 0.4 gave approximately 50 time steps per period. The smaller time step of 0.2 gave 
approximately 100 steps per cycle. The predicted Strouhal number for each combination of grid, time 
step, and turbulence model is presented in table 1. The percent difference from the experimental value 
is given in parentheses. As would be expected for separated flow, the Spalart-Allmaras turbulence 
model produced more accurate results for each grid/time step combination. Time histories of the lift 
coefficient Ci are shown in figures 2 and 3 for the Baldwin-Lomax and Spalart-Allmaras turbulence 
models. The small effect of the reduction of the time step size indicates that the larger time step (with 
50 time steps per cycle) is adequate to predict the Strouhal number. The difference in the results due 
to the change in grid spacing is much larger than the effect due to changes in the time step size. 


Table 1. Predicted Strouhal Number for Circular Cylinder (Mqo = 0.2, Red = 3000) 



Baldwin-Lomax 

Spalart-Allmaras 

coarse grid 

fine grid 

coarse grid 

fine grid 

o 

d 

II 

•4— t 
<1 

0.197 (6.2%) 

0.201 (4.3%) 

0.211 (4.8%) 

0.207 (1.4%) 

At = 0.20 

0.198 (5.7%) 

0.202 (3.8%) 

0.219 (4.3%) 

0.208 (1.0%) 


The second configuration considered was a two-dimensional rectangular cavity in a flat plate. To 
model a configuration tested experimentally [18], a cavity length of 3.0 inches and height of 0.5 inches 
were considered. The flat plate extended 10.4 inches upstream of the cavity. This gives a length to 
height ratio (L/H) of 6. A free-stream Mach number of 0.3 and a Reynolds number of 300,000/inch 
were used. A transition grit was applied near the plate leading edge to force the boundary layer to 
transition to a turbulent boundary layer and for these conditions, no tones were generated and the 
flow was nearly steady. 

A nonreflecting boundary condition was applied at the inflow boundary 21.6 inches ahead of the 
cavity. (See figure 4.) The upper computational boundary was set at 10 inches above the plate where 
a nonreflecting boundary condition was applied. An extrapolation boundary condition was applied at 
the outflow boundary 39.1 inches aft of the cavity. An algebraic grid generation technique was used 
to generate a two-block grid with 49 x 56 points in the cavity and 129 a; 49 points above the cavity 
and flat plate. Power law stretching was used to cluster points near the flat plate and the cavity walls 





















and floor with a spacing of 0.005 inches. A cosine function was used to transition from the clustered 
grid near the surface to a specified fraction of uniform spacing near the far boundaries. (See figure 5.) 

To obtain reasonable starting conditions, TLNS3D-MB was run in steady mode (pseudo-time 
marching). After a reasonable number of multigrid cycles, the calculation was stopped and then 
restarted in unsteady mode with second-order physical time discretization. It has been found that this 
is an effective method for starting unsteady calculations. The lift histories for a laminar calculation 
and turbulent calculations using the Baldwin-Lomax, Spalart-Allmaras, and Menter models are shown 
in figure 6. Note that the laminar results exhibit periodic behavior, while the turbulent results appear 
to approach a steady solution. The turbulent cases were all started from an unsteady laminar solution 
to try to force oscillations, but all models showed a damping of the oscillations. Detailed examination 
of the solution shows small oscillations, but the predicted flow is essentially steady. This result is in 
line with experimental observations of the differences between laminar and turbulent flows in cavities 
[19-21]. The topology of the flow field in the cavity predicted by the turbulent runs is characterized 
by a large recirculation region that fills most of the cavity. Small secondary vortices are also present 
in the lower comers of the cavity. A sample of this is shown in figure 7. The topology of the laminar 
solution is very different. Multiple nonstationary vortices appear in the cavity and then either die out 
or are convected out of the cavity. Streaklines at various times are shown in figures 8 and 9. 

The computed pressure coefficient along the centerline of the floor of the cavity from the present 
turbulent calculations is compared with experimental values in figure 10. Once again, the agreement 
for the Baldwin-Lomax model is not as good as for the one- or two-equation turbulent models for 
separated flow. None of the models predicts the high pressure at the rear of the cavity as seen in the 
experimental data. This result may be due to three-dimensional effects in the experiment. 

To demonstrate the capability of the present method to calculate three-dimensional flows, a three- 
dimensional laminar calculation was performed for the same L/H = 6 cavity with a width to height 
ratio (W/H) of 5. The surface grid and a portion of the outer boundary for this calculation are 
shown in figure 11. The two-dimensional grid shown previously is the grid from the cavity centerline 
plane from this three-dimensional grid. The lift and drag (based on integrated pressures) histories 
of this calculation are shown in figure 12. The flow exhibits the same unsteady properties that the 
two-dimensional laminar calculation contained, although large three-dimensional effects are apparent, 
as evidenced by the streaklines for a selected time shown in figure 13. This calculation required 
approximately 50 CPU hours on a Cray C-90. 


CONCLUSIONS 


A method to accurately calculate solutions to the unsteady Navier-Stokes equations has been 
presented. Multigrid acceleration has been successfully employed to accelerate the calculations of the 
iterative-implicit method. Examples for two-dimensional turbulent flow past a circular cylinder and 
a rectangular cavity, using the Baldwin-Lomax, Spalart-Allmaras, and Menter shear-stress transport 
models, have been presented to show that a frozen implementation of these ‘steady’ turbulence models 



can give good results for these unsteady separated flows. The time-dependent scheme has also been 
demonstrated for a three-dimensional laminar calculation. 
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Figure 1. Coarse cylinder grid (129 x 65). 
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Figure 2. Lift history for circular cylinder with 
Baldwin-Lomax turbulence model (Moo=0.2, ReD=3000). 
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Figure 3. Lift history for circular cylinder with 
Spalart-Allmaras turbulence model (Moo=0.2, ReD=3000). 
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Figure 7. Sample streaklines for turbulent (Spalart-Allmaras) calculation 
of two-dimensional cavity (L/H=6, Moo=0.4, Re=300,000/inch). 
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Figure 8. Sample streaklines at T=109.5 for laminar calculation 
of two-dimensional cavity (L/H=6, Moo=0.4, Re=300,000/inch). 



Figure 9. Sample streaklines at 1=120.75 for laminar calculation 
of two-dimensional cavity (L/H=6, Moo=0.4, Re=300,000/inch). 
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Figure 10. Pressure coefficient along cavity floor for two-dimensional 
rectangular cavity (L/H=6, Moo=0.4, Re=300,000/inch). 



Figure 11. Surface grid for three-dimensional rectangular cavity calculations (L/H=6, W/H=5). 
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Figure 13. Sample streaklines for laminar calculation of 
three-dimensional cavity (L/H=6, W/H=5, Moo=0.4, Re=300,000/inch). 
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INTRODUCTION 


In this paper we consider the simultaneous flow of oil and water in reservoir rock. This displacement 
process is modeled by two basic equations (see, e.g., [1]): the material balance or continuity equations 
and the equation of motion (Darcy’s law). For the numerical solution of this system of nonlinear 
partial differential equations there are two approaches: the fully implicit or simultaneous solution 
method and the sequential solution method. 

In the sequential solution method the system of partial differential equations is manipulated to 
give an elliptic pressure equation and a hyperbolic (or parabolic) saturation equation. In the IMPES 
approach the pressure equation is first solved, using values for the saturation from the previous time 
level. Next the saturations are updated by some explicit time stepping method; this implies that 
the method is only conditionally stable. For the numerical solution of the linear, elliptic pressure 
equation multigrid methods have become an accepted technique. (See, e.g., [2],[3],[4].) 

On the other hand, the fully implicit method is unconditionally stable, but it has the disadvantage 
that in every time step a large system of nonlinear algebraic equations has to be solved. The most 
time-consuming part of any fully implicit reservoir simulator is the solution of this large system of 
equations. Usually this is done by Newton’s method. The resulting systems of linear equations are 
then either solved by a direct method or by some conjugate gradient type method. 

In this paper we consider the possibility of applying multigrid methods for the iterative solution 
of the systems of nonlinear equations. There are two ways of using multigrid for this job: either 
we use a nonlinear multigrid method or we use a linear multigrid method to deal with the linear 
systems that arise in Newton’s method. So far only a few authors have reported on the use of 
multigri'd methods for fuDy implicit simulations. In [5] a two-level FAS algorithm is presented for 
the black-oil equations, and linear multigrid for two-phase flow problems with strong heterogeneities 
and anisotropies is studied in [6]. Here we consider both possibilities. Moreover we present a 
novel way for constructing the coarse grid correction operator in linear multigrid algorithms. This 
approach has the advantage in that it preserves the sparsity pattern of the fine grid matrix and it 
can be extended to systems of equations in a straightforward manner. We compare the linear and 
nonlinear multigrid algorithms by means of a numerical experiment. 

EQUATIONS 


In the absence of gravity forces the volumetric flow rate of water and oil in a porous medium is given 
by the generalized Darcy’s law 

5a = -AaVPa, a = W,0, (1) 

where Aq, and Pa are the Darcy velocity, the mobility, and the pressure of phase a, respectively. 
The saturation of phase a is denoted by Sa, so 

S» + 5o = 1. (2) 
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The phase mobilities Ac are defined by 


Ac 



a = w,o, 


(3) 


where k is the rock permeability, ka{Sa) is the phase relative permeability, and is the phase 
viscosity. In addition to these momentum equations we have mass conservation laws for both phases: 

d S 

' ^ = (4) 

where 4> is the porosity of the rock and Qa is the production rate of phase a. The phase pressures 
Pa are related through the capillary pressure Pc'. 


Pc{S-w ) = Po - Pw (5) 

The equations (l)-(5) are the partial differential equations that make up the incompressible two- 
phase flow model. In the sequel we use and Pq as the independent variables and drop the 
subscripts. 

We still have to specify the boundary conditions. Usually the flow across well boundaries is 
modeled by point sources and sinks, and no flow boundary conditions are imposed at the boundary 
of the reservoir. This has the effect of shifting all complications to a proper modeling of the injection 
and production wells. 


DISCRETIZATION 


In this section we describe the fully implicit discretization of the multiphase flow equations. For ease 
of notation we assume a uniform porosity 0 and rock permeability k. Moreover we only consider 
the two-dimensional case with a uniform Cartesian grid. The equations are discretized in space by 
a finite volume scheme (cell-centered finite-differences or box scheme). For the time integration the 
backward Euler method is used. This leads to the system of equations 




I = o, 

I = o. 


( 6 ) 


(7) 


In the above, h denotes the mesh width; the subscripts i, j, the discretization cell; and the superscript 
n, the time level. The fluxes at the edges between cells are approximated with upstream weighted 
mobilities. For example, the fluxes j at the edge between the cells i,j and i,j + \ are 

approximated by 




pn + l _ ( p 1 " + 1 _ pn + 1 , ( p \n + l 
\n + l -‘i + l.j + "i,j 

pn+1 p!^+i 

l\ 1^^ + ^ »+l|j i,i 

. 1 


( 8 ) 

(9) 


with 


n + 1 









( 10 ) 


In the case of nonuniform rock permeability k, the permeability k._^i . at the cell edge is the harmonic 
average of the values in the adjacent cells. 
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MULTIGRID 


In each time step we have to solve the large system of nonlinear equations (6)-(10). We consider 
cell-centered multigrid methods for the iterative solution of these systems. In cell-centered multigrid 
methods the coarser grids ■ ■ ■ are constructed by successively doubling the mesh width 

of the fine grid G^. Hence, each coarse grid cell is the union of four fine grid cells. In this paper 
we focus on the coarse grid correction. Suppose that on the fine grid G^ we have the system of 
equations 

= ( 11 ) 

where is a possibly nonlinear operator. The coarse grid corrections that we consider- are of the 
form 

+R2fe(/'‘ (12) 

v!^ = v!^ + (13) 

where iijh denotes the restriction that is the adjoint of the interpolation by a piecewise constant 

function. In the cell centered multigrid method this is natural: the residual (the total excess of 
accumulation and net flow) in a coarse grid cell is the sum of the residuals in the corresponding 
four fine grid cells. The prolongation is the piecewise bilinear interpolation. This combination 
of prolongation and restriction is formally sufficiently accurate to deal with second order partial 
differential equations. 

We will now develop two multigrid methods for (6)-(10). In the nonlinear multigrid method (the 
FAS algorithm [7]) we deal with this nonlinear system of equations directly, so A/"^ is a nonlinear 
operator. On the other hand, in the linear multigrid method A/"^ is the Jacobian matrix of the system 
of nonlinear equations. We present a novel way to construct the coarse grid correction operator for 
the linear multigrid algorithm. 


Nonlinear Multigrid 

The nonlinear multigrid method that we use is the FAS algorithm. To obtain the coarse grid 
operator A/^^^ the problem is discretized on the coarse grid (i.e., a grid with mesh size 2h). There 
are only homogeneous boundary conditions; therefore, the treatment of the boundary conditions on 
the coarse grids is trivial. If there is a well in a grid ceil on the fine grid, then it is also present 
in all father cells on coarser grids. Because the problem is nonlinear, the properties of the coarse 
grid operators are determined by the choice of Here we take , where i® ^he 

interpolation by piecewise constants. 

We use a collective point Gauss-Seidel-Newton method as the smoother in this multigrid algo¬ 
rithm. This means that all cells are visited in some predetermined order, and equations (6) and 
(7) are solved simultaneously for the variables related to that cell. This system of two nonlinear 
equations is solved by Newton’s method. 


Linear Multigrid 

We can also use multigrid to solve the linear systems of equations that occur when applying Newton’s 
method on the fine grid G^. Let us again consider the construction of the coarse grid linear operator 
that is used in the coarse grid correction. Basically there are two ways to define this coarse grid 
operator. Given prolongation and restriction operators we can define the coarse grid operator as the 
Galerkin approximation to the fine grid operator; this is done in [6]. This approach is straightforward 
but it has a disadvantage in that for simple linear elliptic equations the coarse grid matrix may loose 
the M-matrix property. Moreover, the stencils of the coarse grid operators are often denser than 
the corresponding fine grid stencil (cf. [8]). The alternative approach is to discretize the problem 
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on the coarse grid as in the nonlinear multigrid algorithm and to use the Jacobian of the nonlinear 
coarse grid operator as the coarse grid operator for the linear multigrid algorithm. An advantage to 
this approach is that all nice properties of the fine grid operator are immediately carried over to the 
operators on the coarser grids. We now try to combine these approaches; the coarse grid operator 
is defined by means of a Galerkin-like construction that is based on the coarse grid discretization 
approach. 


To explain this construction we consider a simple one-dimensional, second-order scalar conser¬ 
vation law 


where q is some function of the solution u and ^. A simple finite volume discretization on the fine 
grid with uniform mesh width h leads to a system of equations of the form 


9i+i 




hft, 


(15) 


with 




(16) 


Suppose that we use Newton’s method on the grid G^. In a single iteration step we then solve the 
following problem: find AttJ’ such that 



hft - {Qi+i - qi_i) - Mi+i - 

(17) 

with 


(18) 

This can be written 

in matrix form: 




(19) 

For example, let us 

consider the linear convection-diffusion equation 



d / dti\ 

(20) 


dir+'s)”"' 


with boundary conditions u(0) = 0 and u(l) = 1. A forward discretization for the convective term 
yields 






+ 6 ' 




( 21 ) 


If we use discretization on the coarse 

by 


grid G^^ to define the coarse grid operator, its stencil is given 




( 22 ) 


Interpolation by piecewise constants, which is the natural choice for prolongation and restriction in 
multigrid algorithms for finite volume schemes, is of course insufficiently accurate for this second 
order problem. However, if we construct the coarse grid operator as the Galerkin approximation 
using these natural transfer operators, we obtain the coarse grid stencil 



(23) 


Glearly the treatment of the second order diffusion term is different for the finite volume discretization 
approach (22) and the Galerkin approximation (23). 

We compare the efficiency of these two methods by means of a simple numerical experiment. 
We take the convection-diffusion equation (21) with e = 0.01. In both cases we use a restriction 
that is the transpose of piecewise constant interpolation and a prolongation by a piecewise linear 
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h 

h 

2e 

Galerkin 

FVD 

z/ = 1 

I! 

z/= 1 

i! 

1/8 

6.25 

0.60 

0.38 

0.53 

0.36 

1/16 

3.12 

0.58 

0.42 

0.54 

0.37 

1/32 

1.56 

0.55 

0.44 

0.50 

0.35 

1/64 

0.78 

0.54 

0.45 

0.47 

0.31 

1/128 

0.39 

0.54 

0.47 

0.45 

0.32 

1/256 

0.20 

0.53 

0.47 

0.42 

0.30 

1/512 

0.10 

0.53 

0.47 

0.43 

0.28 


Table 1: Two-level convergence rates for the linear convection-diffusion equation with two different 
coarse grid operators: the Galerkin approximation and Finite Volume Discretization. 


function. For smoothing we apply damped Jacobi relaxation with a damping factor of 2/3. We 
do not use a Gauss-Seidel smoother because it is an exact solver for the pure convection equation. 
Therefore, it is not suitable for comparing the merits of the different coarse grid correction operators 
in the convection dominated case. Table 1 shows the observed two-level convergence rates for 
both algorithms with one {v = 1) and two {i/ = 2) smoothing steps. If the mesh Peclet number 
h/2€ is greater than 1 (convection dominates), then the convergence rates are comparable for both 
algorithms. Applying two smoothing steps improves the convergence rates, so low frequency error 
components are indeed reduced efficiently in the coarse grid correction. However, when diffusion 
dominates, the two-grid algorithm with the Galerkin approximation performs worse than the coarse 
grid discretization approach. Applying two smoothing sweeps hardly improves the convergence rate 
of the algorithm with the Galerkin approximation. As the grid interpolation operators used in its 
construction are too inaccurate, the coarse grid correction is incorrect. 

Comparing the coarse grid stencils (22) and (23) suggests another approach for the construction 
of the coarse grid matrix (cf. [8]). Let us assume that we can split the derivatives in terms with 
different order behavior with respect to the mesh size h: 


p=o.i 






(24) 

(25) 


with 

jq ^ ^„ ,(u —/lAu, u-I-fiAu, h) = Cl(h“^) for h —> 0. (26) 

‘+3 ' 

For the forward discretization (20) of the linear convection-diffusion equation this leads to a splitting 
in convective and diffusive terms: 



0 , 





Let the matrix consist of the elements ,, , so 


(27) 


J’' = ( 28 ) 

p=o,i 

We define the coarse grid operator now as follows: 



585 





where and interpolation by piecewise constants. For the example of the linear 

convection-diffusion equation this yields exactly the same coarse grid operator as the one obtained 
by discretization on the coarse grid (cf. ( 22 )). 

We use this approach for defining the coarse grid operator also in the case of a system of conser¬ 
vation laws. For our two-phase flow model the fluxes are given by ( 8 ), (9), and (10). With obvious 
abuse of notation we define the splitting as follows: 


Ja,P. . 

= < 0 , a = w,o, 


.•1 

Ja.P. j 

= +(Aa),+i,y^, a = 

w,o, 


5(Au,)i4.ij P j + 


3w ,S ^ j 


h 


~ a ) ^ 


o'O 


- PiS 

o,S . ■ 

dSij h 

» 

io,5. . 

= 0 . 



(30) 

(31) 

(32) 

(33) 

(34) 

(35) 


The accumulation terms are of course treated as zero order terms. 

We notice that the implementation of (29) is simple due to the fact that we are using piecewise 
constant grid interpolation operators. The entries of the fine grid matrix consist of terms related to 
either cells (the accumulation terms) or to edges (the flux terms). The coarse grid matrices have the 
same structure, where the coarse grid cells consist of four fine grid cells; the coarse grid edges consist 
of two fine grid edges. Because we are using piecewise constant interpolation operators, (29) implies 
that we can simply add the terms related to cells on the finest grid to the corresponding terms in 
parent cells on all coarser grids. Next we calculate the flux terms 5 and p.- Each of these 
terms can be associated with a unique edge between two cells. As we are using piecewise constant 
interpolation operators and as the terms ^ and p appear with opposite sign in the linearized 
discrete equations for the two cells (cf. ( 6 ) and (7)), it follows that these terms do not contribute 
to the coarse grid matrix if the fine grid edge is not part of a coarse grid edge. However if the fine 
grid edge is part of a coarse grid edge, we add that coefficient, multiplied by the appropriate scaling 
factor, to the coefficient at the parent edge. This is done recursively until we end at the coarsest 
grid. The splitting in terms related to cells and edges thus yields a straightforward implementation 
of (29). 


NUMERICAL RESULTS 

In this section we show some results for the numerical simulation of the flooding of a typical labora¬ 
tory scale model. This problem is taken essentially from [9]. The model consists of a thin sand pack 
simulating a quadrant of an infinitely repeating five-spot. Some properties of the model are shown 
in Table 2. The model is placed horizontally, so the gravity effect can be neglected. Initially there 
is a uniform saturation Si in the model. Water is then injected into one corner of the pack at a 
constant rate g,-, and oil and water are produced at the opposite corner. Several cases are considered 
with widely varying oil-water viscosity ratios M = ^lo/Hw (See Table 3.) For these data the flow is 
convection dominated, so steep gradients develop in the water saturation Sw Because the transition 
regions cannot be resolved on the coarser grids, this is an interesting test problem for the multigrid 
algorithms. The functions ka{S) and PdS) are smooth functions and good approximations to the 
data given in [9]: 

k^S) = . ( 36 ) 
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(37) 


K{S) = 0.67 . 

Pc{S) = Q g 62.3 X 10^ [dyne/cm^]. (38) 

For the discretization of this problem we use several grids. The coarsest grid in all calculations is 
a 5 X 5 grid, and the fine grid contains 80 x 80 grid points, so the total number of unknowns for 
the fine grid is 12800. The calculation is stopped when three times the total pore volume has been 
injected. 

In all time steps the discrete problem is solved with a tolerance t < 1 x 10“^, where r denotes 
the £ 2 -iiorm of the residual scaled by the inflow 5 ,-At in that time step. The total oil balance error 
([initial — final oil in place]/cumulative oil production) is always less than 2 x 10“^. The time steps 
At" are selected in order to have changes in the saturation of approximately 0.05: 


At 


n+l 


0.05 


IS”-5’ 


rn-l| 


-At". 


(39) 


The ratio At"+^/At" is bounded between 0.5 and 2.0. 

In Figure 1 the numerical approximation of the water saturation after injection of 0.25 times 
the total pore volume is plotted for test problems 1 and 2. In test problem 1 there is a favorable 
mobility ratio M, and the water displaces the oil in a piston-like manner. However, in test problem 
2 we have an unfavorable viscosity ratio. The water saturation at the shock front is now lower 
than in the previous case, and the water breakthrough occurs earlier. This is in agreement with 
the classical one-dimensional Buckley-Leverett theory. Figure 2 shows the volume of produced oil 
versus the volume of injected water expressed in pore volumes. These results are obtained on the 
80 X 80 grid. These production curves are (of course) in good agreement with the results presented 
in [9]. As expected from the Buckley-Leverett theory a large mobility ratio M leads to an inefficient 
oil recovery process. 

For our purposes, the convergence speeds of the two multigrid algorithms that we are considering 
are more interesting. To estimate the convergence speed of the nonlinear multigrid algorithm, we use 
the average residual factor pnmg- Here we take the ^ 2 -iiorni of the residual of the noraZiuear discrete 
equations (( 6 ) and (7)). The convergence speed of the linear multigrid algorithm is estimated by 
the average residual factor Plmg, which uses the £ 2 -norm of the residual of the linear equations 
in Newton’s method. In all runs we used F-cycles with a single smoothing step for pre- and post¬ 
smoothing. Because the flow is basically from the injection corner toward the production corner, 
a single Gauss-Seidel sweep suffices. In more complicated situations a four direction Gauss-Seidel 
method has to be used. 

In Table 4 we show the estimated convergence speeds p on different fine grids for the different 
test cases. In all cases both multigrid algorithms perform satisfactorily; we observe a fast, grid- 
independent convergence behavior. The average residual reduction factor p is always less than 0.15. 
In the nonlinear multigrid algorithm we find that typically three or four F-cycles are needed to satisfy 
the stopping criterion. In the linear multigrid algorithm typically two Newton steps are needed for 
convergence; altogether, typically four F-cycles per time step are needed. In Table 5 the average 
execution times on a HP-735 work-station, are shown. Although our code is far from optimal, two 
tentative conclusions can be drawn from it. First, both algorithms show optimal complexity; the 
time needed per time step and per grid point is independent of the number of grid points. Second, 
the linear multigrid algorithm is more efficient than the nonlinear one. This is due to the fact that in 
the nonlinear algorithm functions like ka{S) and PdS) (and their derivatives) have to be calculated 
much more often. 


SUMMARY 

We have presented two multigrid algorithms for the fully implicit simulation of incompressible, 
immiscible two-phase flow in a porous medium. The nonlinear multigrid algorithm is a standard 
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Figure 1: Water saturation for Test 1 (left) and Test 2 (right) after injection of 0.25 pore volume. 
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Figure 2: Oil production curves for the different test problems. 
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2 
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0.09 

■iffEI 

||K I 

Esa 

0.11 

QEl 

0.17 
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20 X 20 

0.10 

0.13 

Q a 

0.16 

0.12 

119 

0.16 

Hi 

40 X 40 

0.10 


Ell 

Esa 

0.12 

Hi 

0.15 

0.14 

80 X 80 

0.10 

0.11 

0.14 

Hi 

0.10 

0.11 

0.14 



Table 4: Convergence rates for different test cases. 


grid 
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NLMLTG 

10 X 10 


1.53 

20 X 20 


1.89 

40 X 40 


2.09 

80 X 80 

0.53 

2.09 


Table 5: Typical execution times [msec] per time step per grid point. 
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FAS algorithm. The linear multigrid algorithm that is used to solve linear systems in Newton’s 
method employs a nonstandard construction for the coarse grid matrix. Both algorithms perform 
satisfactorily for a simple 2D test problem. The linear multigrid algorithm appears to be more 
efficient with respect to the execution time needed. 
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ABSTRACT 


Over the years, miiltigrid has been demonstrated as an efficient technique for solving inviscid flow prob¬ 
lems. However, for viscous flows, convergence rates often degrade. This is generally due to the required use of 
stretched meshes (i.e. the aspect-ratio AE. = Ay I Ax << 1) in order to capture the boundary layer near the 
body. Usual techniques for generating a sequence of grids that produce proper convergence rates on isotropic 
meshes are not adequate for stretched meshes. This work focuses on the solution of Laplace’s eqtiation, 
discretized through a Galerkin finite-element formulation on unstructured stretched triangular meshes. A 
coarsening strategy is proposed and results are discussed. 


Introduction 

Multigrid method has been shown to be successful for solving elliptic problems. This is mainly due to its 
good damping properties which result from two very simple principles. A usual Fourier analysis demonstrates 
that most of the commonly used solvers effectively damp the high frecpiencies of a feignal. A low frequency 
component of a given signal on a fine mesh becomes a high frequency on a. coarser one, hence the idea of 
solving the same problem on a sequence of meshes where all frequencies can be damped equally and, if 
enough grids are available, only a few iterations will be required to produce a converged solution (for more 
details see [1]). Despite these rather simple considerations, the multigrid algorithm is complex and difficult 
to implement. One of the dilficultles resides in the generation of the seqiience of grids for unstructured 
meshes. The convergence properties of the multigrid method depend iq)f)u the “quality” of these grids. 

A sequence of meshes may l)e producerl through two different methods, ffirst, starting from a !uesh that 
is not too fine but correctly rejiresents the problem, finer meshes may be generated throttgh refinement. 
A global refinement, performed through local subdivision of the triangles of the discretization, tends to 
preserve the geometrical features required to obtain an efficient multigrid method. However, this will clearly 
not be efficient in terms of computational cost, hence the local refinement technique where specific regions 
of the mesh are refined and then ])ossibly adapted [2]. Although this method seems more reasonable, it 

‘This research was supported under the NASA contract No. NASl-l!)d80 wliilc the authors were in residence at 
ICASE. 
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increases the computational time and the complexity of the multigrid algorithm. Another method consists 
in coarsening an existing fine mesh, which has been created to represent accurately the different phenomena 
to be observed. One of the techniques available consists in removing, through a coarsening criterion, a certain 
number of nodes from the initial mesh and to reconnect (retriangulate) the remaining set of nodes. This 
method is especially effective in the case of non stretched meshes [3]. The reconnection usually relies on 
the Delaunay technique [4] that tends to produce the “most equilateral” triangulation for the given point 
distribution and therefore is not easily applicable to stretched meshes. In order to avoid retriangulation, 
the so-called agglomeration technique (see Lallemand et al. [5]) is interesting. The generation of coarser 
meshes consists in the agglomeration, or fusion, of the control volumes of the discretization. However, for 
consistency considerations, when it comes to viscous flows, more accurate intergrid transfer operators are 
required [6, 7]. 

The following study focuses on the 2D Laplace’s equation Au{x,y) = 0, since the poor convergence 
properties of the multigrid technique, observed when solving the Navier-Stokes equations on stretched meshes, 
also appear for the solution of this simpler equation. The purpose of this work is to propose new coarsening 
strategies that will preserve the convergence rate of the usual isotropic multigrid technique. This is defined 
as a semi-coarsening method. This study will show how this process may be extended from the case of 
regular structured grids to totally unstructured meshes. 

The organization of the paper is as follows; the discretization of the 2D Laplace’s equation is introduced 
in Section 1 along with an edge-based data structure. Section 2 recalls the essential multigrid convergence 
properties. The generation of stretched grids is addressed in Section 3. A semi-coarsening algorithm, ex¬ 
tended to unstructured meshes, is presented in Section 4. Finally, numerous experiments are discussed in 
Section 5. 


1 Laplace’s equation 



Figure 1: Linear basis function tpi. 

The problem consists in solving Laplace’s equation: 

Au{x, y) = 0 on D convex polygonal domain, 
u = Wo on F. 



( 1 ) 


A Galerkin Finite-Element formulation is used on unstructured triangular meshes. An integration by parts 
results in: 


/ Aw (fi duj = — I Vu • Vipi du) + I Vu • n (pi da 

JUi Jcii JVi 


( 2 ) 
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where (pi is the linear basis function as depicted in Fig.l. If u is piecewise linear, then the Green formula 
and the notations of Fig.2 result in: 


= — Ukj 

(Vu.)7', = (uififcj + UkHij - ujfiik) 

ZAi 


(3) 


where Uj is the value of the solution u on vertex i, Ai is the area of triangle T \, ny the vector normal to the 
edge [i, j] and of magnitude equal to the length of the edge. Equation (2) can be rewritten as: 


Auipiduj - (V¥),:)7'; • (Vu)7’j dw = ■ (Vu)ri 


(4) 


Moreover, for the considered triangle T \, (3) can be rewritten as: 


1 

('*Ar)7'i ~ ~ ^‘^jk^Vji) 

2Ai (5) 

~ ~ AujkAxji) 

where Aw,j = Uj — Uj. A similar formulation can be written for triangle 72. In evaluating the coefficient for 
the edge joining vertices i and j, only the triangles Ti and T 2 will yield non-zero contributions. The final 
expression of (4) is thus an edge-based formulation: 




Am ipi dijj 


E 

edges 


/ ^VikAvjk ^yiAyji \ 
V A 2 ) 


+ 


/ AxjkAxjk AxiiAxji \ 

V >li A 2 ) 


Auij 


(6) 


where the sum is taken over all incoming edges for vertex i. The geometrical anisotropy is reflected in the 
coefficient associated with each edge. If the length ||fj|| increases (the nodes k and I being fixed) then the 
value of the coefficient decreases. Therefore, considering the domain J2,: = the maximum coefficient is 

associated with the smallest connecting edge and the minimum with the longest. 


2 Some definitions and convergence results 

Multigrid theory relies on the use of a sequence of nested meshes for solving (1). These meshes represent 
the different spaces where the equation is discretized. In what follows, only two meshes are considered: 
Tf/, and "Hj] with H — 2h and 'Hu G 'Hh C The discrete problem on the fine grid is written as: 

AkUh ^0 (7) 

A weighted Jacobi relaxation is considered as the basic iterative process or smoother: 

= Sh u]] = (7 - w 7 ;,, ' Ah) M/1, where Dh = (A/,),;; (8) 

In order to use both spaces for solving (7) if is necessary to use transfer o])erators. A linear interpolation 
r : -Hn — > 'Hh. defines the ])rolongation operator, and its transpose H — P*’ : 77/, —> 77./; defines the 
restriction. The 2-Grid iterative operator M/, is then defined by: 

H Mh vi = m;i 

= {A,;' -PA-„'R){Ah.Sr,,)ul 

with ;q = u pre-relaxations and 7/7 “ 0 post-relaxations. 
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One very important feature of a multigrid (MG) algorithm is its mesh-independent convergence. Accord¬ 
ing to Hackbush [8], mesh-independence for elliptic operators, is achieved through the smoothing property 
^ ^ where lim,^_,.oo— 0) and the approximation property (||A^^ —= 0{h?)). 
Because of its nature, the MG algorithm converges linearly with respect to the number of MG-cycles. 

Morano et al., in [3], showed that this may also be achieved for the Euler and low Reynolds number 
Navier-Stokes equations where the employed meshes are not stretched. However, when highly-stretched 
elements are used (mandatory for high Reynolds number solutions, see [7] for example), this convergence 
greatly deteriorates with classical fully-coaxsened (FC) grids. It is no longer linear nor mesh-independent. 
The deterioration in convergence is also observed when the resolution of Laplace’s equation is attempted 
with highly stretched elements, that is, when the mesh is anisotropic. 

3 A sequence of grids 

When very stretched elements are used, the damping properties of the smoother are negligible in the stretch¬ 
ing direction. Thus, using a full-coarsening strategy will certainly not improve the damping properties, since 
the stretching is fully preserved on larger elements. Moreover, the distribution of nodes in the stretching 
direction will correctly represent the low frequencies of the signal, whereas, in the direction normal to the 
stretching, it will represent the high frequencies. Because of the nature of the smoothers commonly used, the 
multigrid technique damps mainly the high frequencies, hence the idea of semi-coarsening in the direction 
normal to the stretching. 


level 1 

/ _ 

. level 2 

/ _ 

. level 3 

— — level 4 

\ _ 

[— level 5 


Figure 3: Sequence of grids for MSG. 

The semi-coarsening technique is well known and used especially in the structured mesh community. For 
complex geometries, however, multiple directions within the mesh require semi-coarsening. A process named 
Multiple Semicoarsened Grid (MSG) Algorithm was introduced by Mulder [9]. This technique relies on the 
generation of numerous grids that are semi-coarsened (SC) from the finer grid in all possible directions as 
depicted in Fig.3. This ensures proper dissipation of the signal. A multigrid scheme is then implemented 
using all the grids which is complex and costly, especially for 3D problems [10]. Moreover, there is no possible 
extension of this technique to unstructured grids. 

The complexity of the usual multigrid technique also relies on the full-coarsening method. This technique 
consists in removing every second vertex in each direction on a regular structured mesh, which results in 
a number of nodes of the coarse grid decreased by a factor 4. The V-cycle complexity of such a method 
tends to 4/3 WUs (a Work Unit corresponds to the computation of one residual on the fine grid). The 
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semi-coaxsening technique produces coarse grids with a number of nodes decreasing by a factor 2 and the 
overall complexity tends to 2. Therefore, such a method will cost more per cycle. However, it will be shown 
that this technique allows a much better damping factor than a regular full-coarsening technique in the case 
of stretched meshes. 

The smoothing property is valid for the weighted Jacobi relaxation scheme applied in this study. The 
effect of the approximation property is emphasized since it determines the mesh-independence of the conver¬ 
gence. This property is verified when the discretized subspaces, defined by the sequence of coarser meshes, 
utilized within the MG algorithm are nested. In this paper, the sequence of meshes is created through 
a semi-coarsening technique followed by a retriangxfiation. When this strategy is applied to imstructured 
meshes, the nestedness of the meshes is rather difficult to preserve. The nodes of the coarse grid form a 
subset of the nodes of the fine grid which produces node-nested, but not element-nested, grids. 





c. Node-Nested Grid. 


b. Fully-Nested Grid. 




Figure 4: Coarse grid discretizations AR = 1. 


The example depicted in Fig.4 shows how the convergence varies with respect to the nestedness of the meshes. 
A non-stretched 89 node Cartesian mesh defines the fine grid (Fig.4.a). The boundary conditions are those 
defined in Section 5. Three different coarse grids are considered. Each of them is a node-nested grid and 
comprises 25 nodes. Fig.4.b shows a usual fully nested grid. Fig.4.c and d depict randomly coarsened grids. 
On the right side of the grid shown in Fig.4.c a few elements are not nested. Finally, Fig.4.d depicts a 
coarsened grid where the elements are anything but nested. Two-grid experiments (see Section 5.1) are 
performed and Fig.4.e depicts the respective convergence histories. The convergence rate ranges firom 0.15 
to 0.31 for such a simple test-case. Therefore, the nestedness of the grids is of extreme importance in the 
quality of the MG performance. Further results may be obtained in [11]. 


4 Semi-coarsening and unstructured meshes 

In what follows is presented a semi-coarsening technique that is applicable to unstructured meshes as well 
as to structured meshes. The technique may be seen as a variant of the Algebraic Multigrid (see [12]) in the 
sense that it necessitates a pre-processing stage that relies on the discretization of the equation for generating 
the coarse grids. As mentioned previously, the Galerkin discretization of Laplace’s equation amounts to a 
sum over edges. The value of the coefficient associated with each edge is determined by the geometry of the 
surrounding elements (triangles). The smaller the length of the edge, the larger the value of the coefficient. 
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'riio semi-coarsening technique proceeds as follows: once a node is selected to remain on the coarse grid, 
its neighbors must be scanned to determine which one of them has to be removed. The removed node 
corresponds to the edge associated with the largest coefficient. The algorithm is two-fold. First, it has to go 
through the mesh and select the nodes to remain on the coarse grid, and, second, for each selected node, it 
has to determine which of its neighbors is to be removed. The setup employed for coarsening is the same as 
that used for agglomeration in [13, 7]. 

Unstructured meshes for high-Reynolds number flow computations are essentially comprised of two re¬ 
gions: one where the aspect-ratio is (very) small, where the viscous effects are dominant, and another one, 
where the aspect-ratio is close to 1, far from the viscous effects (the farfield for example). In order to pre¬ 
serve the low complexity of an MG algorithm it might be desirable to perform the semi-coarsening only in 
the low aspect-ratio region, whereas a full-coarsening may be applied elsewhere. Again, this is similar to 
an Algebraic Multigrid as described in [12]. This should provide a slightly better complexity than the one 
obtained through semi-coarsening only. The algorithm is written as: 


1. For each node i on the fine grid the average and maximum values of the coefficients coefi of its 
connecting edges are computed: avgt and maxi. 


2. The parameter ft 


1 


N nvgi 




provides an indication of the anisotropy. 


3. The determination, through a heaplist, of the vertex jpick that remains on the coarse grid is then 
performed. 


4. The removal of the connecting neighbor(s) of jpick is achieved througli a coarsening criterion. 
Goto [3]. 


The heaplist serves as an advancing front. The starting point of the front will determine the quality of the 
subset of nodes which constitute the coarse grid. Since semi-coarsening consists in removing every second 
vertex in the direction normal to the stretching, it is expected that the advancing front should be initiated 
from the region conii)rising the lowest aspect-ratio elements (the surface of an airfoil for example). Therefore, 
the following items are incorporated: 

• Technical programming considerations make the front start first with the boundaries. 

« The body and farfield extrema are retained on the coarse grid in order to preserve the general geometry 
of the discretized domain. 

• The heaplist is determined by a “key-function” [14]. This “key-function” is defined by the connecting 
distance (minimum number of edges) to the boundary (or region where the aspect-ratio is minimtim) of 
the unprocessed vertex (not in the front). The result is a list of edges where the first edge is associated 
with the minimum distance and jpick is its unprocessed vertex. 

Once a node is selected to remain on the coarse grid, a semii-coarsening criterion determines which of the 
n^’riinr connecting neighbors of jpick is to be removed: 

1. nh,„„^ is defined by the maxinium number of nodes to be deleted: 

if jnaxjpii-i. > P (ivgjpirk then nb„iax — f (Semi-Coarsening), 

else nbrnax ~ nb^^gg (Full-Coarsening). 

2. The array LiFstjpj,-k contains the available unprocessed neighbors. 

Ufifiy the number of deleted nodes, is set equal to 0. 
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3. The determination of the available local maximum coefficient is performed: locmax = max (coefi). 

i€Lisf.jpick 

4. A node i € Listjpick is removed if: coefi = locmax and loCmo.x > o.'^gjpick- That is if its value is 
equal to the maximum local coefficient and if this maximum is greater than the average value of all 
the surrounding coefficients. 

5. The array Listjpick is updated along with the number of deleted nodes {udei ■<— n^el + !)■ 

If ridel < nbmax goto [3]. ' 

This algorithm clearly provides a semi/full-coarsening (S/FC) technique. Yet, if appropriate, the algo¬ 
rithm only performs semi-coarsening or full-coarsening. Such an algorithm may be applied to unstructured 
meshes as well as to structured meshes provided the considered discretization relies on an edge-based data 
structure. This algorithm relies on the discretization of the equation to be solved rather than on simple 
geometrical considerations. 

a. Delaunay - Max Min. b. Min Max - Variant. 

Figure 5: Retriangulation techniques. 



Once the subset of nodes of the fine grid is obtained after coarsening, it needs to be retriangulated. The 
reconnection relies here on a Delaunay method. This method has proved useful and efficient when used in 
conjunction with equilateral triangle types of meshes. The coarsening technique utilizing such an algorithm 
was introduced in [15]. Unfortunately, this method does not apply to highly stretched meshes. It usually 
results in a poor reconnection in the region where the nodes of the mesh are not regularly distributed. In 
order to overcome this difficulty, an edge-swapping technique may be employed [16, 17). The Delaunay 
reconnection of a set of four nodes results in two triangles where the minimum angle is maximized (Fig.S.a). 
In lieu of preserving this connectivity it is possible to swap the edges by minimizing the maximum angle 
of the two triangles (Fig.S.b). This technique has proved very efficient when used with an advancing front 
technique for generating meshes, and is thus employed for the unstructured test-case in this paper. The 
reconnection of the structured coarse grids is performed through the usual Delaunay method. 


5 Results and comments 

In order to validate the previous concept, various test-cases are performed for solving the Laplace’s equation. 
Results are presented on structured and unstructured meshes. The discretization domain for the structured 
cases is defined by a square of surface 1, while the unstructured case is defined by a pentagon plunged in an 
unstructured mesh. A non-stretched structured test-case serves as the standard test-case, since it provides 
the best MG convergence. The relaxation parameter w is equal to 0.85 and no optimization is performed 
here. Two sweeps are performed on the fine grid. The transfer operators are linear and were introduced 
in [18]. All cases are performed with Dirichlet boundary conditions. For the structured test-cases they are 
defined by w(0, x) = 1, u{x, 1) = 2, m(1,x) = 3 and w(x,0) = 4, and for the unstructured case they are equal 
to —1 on the body and to ] on the farfield. For all test-cases, the different grids used are presented along 
with the convergence histories of the various schemes. The convergence histories depict the logarithm of the 
norm of the normalized residual with respect to the number of cycles. This convergence is carried over until 
a residual decrease on the fine grid equal to 10“*®. 
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5.1 Two-Grid experiments 

These experiments require a residual decrease on the coarse grid equal to 10“^°. The semi-coarsening-only 
{nbmax = 1) option of the algorithm is used for the generation of the coarse grids. 


Non-stretched Meshes. The aspect-ratio is equal to one and the grids are fully-nested. The fine and 
coarse grid, respectively, are similar to those depicted in Fig.4.a and b with 4225 (65 x 65) and 1089 (33 x 33) 
nodes, respectively . The coarse-grid is a manually (M) fully-coarsened grid (i.e. the coarsening algorithm is 
not involved). No anisotropy is encountered here and a solution is obtained after 12 cycles which corresponds 
to a convergence rate of 0.15. 



a. 4257 Node Fine Grid. 



d. 2145 Node SC Grid (C). 


b. 1105 Node FC Grid (M). c. 2145 Node SC Grid (M). 



e. Resulting Convergence Histories. 
Figure 6: Linear Meshes - AR = 1/4. 


Linear Meshes. A 4257 (33 x 129) node fine grid is built (Fig.6.a) where the distribution of nodes is 
linear in the vertical (normal to the stretching) direction and the aspect-ratio is equal to 1/4. Three types of 
coarser meshes are presented. In Fig.6.b is depicted a manually fully-coarsened 1105 (17 x 65) node coarse 
grid, that represents the classical coarsening technique. In Fig.6.c and d are depicted two semi-coarsened 
grids. The first grid is obtained manually through a vertical semi-coarsening in a 2145 (33 x 65) node 
coarse grid. The second grid is the result of the coarsening algorithm (C) applied to the fine grid. It is a 
2145 node coarse grid. The triangulations of the two semi-coarsened grids appear to be diff'erent while the 
subset of nodes are the same. Yet, similar convergences are expected. In Fig.6.e are depicted the various 
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convergence histories. The full-coarsening technique results in a convergence rate of 0.77 while the semi¬ 
coarsening techniques provide both a convergence rate equal to 0.15, which is identical to the convergence 
rate of the non-stretched test-case. 



a. 4257 Node Fine Grid. b. 1105 Node FC Grid (M). c. 2145 Node SC Grid (M). 




e. Resulting Convergence Histories. 

Figure 7: Exponential Meshes - AR = 2.4 x 10“^. 

Exponential Meshes. A 4257 (33 x 129) node fine grid is depicted in Fig.7.a. The distribution of 
nodes is exponential in the vertical direction. The minimum aspect-ratio is equal to 2.4 X 10 ^ and the 
maximum to 2.2. This grid is manually fully-coaxsened which produces a 1105 (17 x 65) node coarse grid 
(Fig.7.b). A manually vertically semi-coarsened 2145 (33 x 65) node coarse grid is depicted in Fig.7.c. 
Where the stretching follows the horizontal direction (where the distribution of nodes is more dense) this 
technique will provide the expected result, while the stretching deteriorates in the vertical direction (where 
the distribution of nodes is less dense). A 2141 node coarse grid obtained with the coarsening algorithm is 
depicted in Fig.7.d. In this case the coarsening follows the direction normal to the stretching everywhere in 
the mesh, as can be seen in the less dense region. The full-coarsening technique results in a 0.80 convergence 
rate (Fig.7.e). The manually semi-coarsened grid proves to have a much better convergence rate of 0.28, 
but the best convergence rate of 0.20 corresponds to the automatically semi-coarsened grid. Moreover, the 
vertically semi-coarsened grid shows a change of slope at the end of the convergence. This means that the 
MG algorithm does not perform optimally and does not damp low frequencies correctly, whereas the code 
semi-coajsened grid provides a linear-type of convergence rate. Therefore, and although both semi-coarsened 
grids have similar numbers of nodes, the coarse grid obtained through the automated coarsening algorithm 
results in more optimal convergence. 
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a. 4225 Node Fine Grid. 


c. 2145 Node SC Grid (M). 


b. 1089 Node FC Grid (M). 




Figure 8: Chebyshev Meshes - AR — 0.024. 


Chebyshev Meshes. A 4225 (65 x 65) node fine grid is built where the distribution of nodes is a cosine 
function in both directions. The minimum aspect-ratio is equal to 0.024 and the maximum to 40.73 (Fig.S.a). 
This grid comprises stretched and non-stretched elements. The minimum aspect-ratio cells are essentially 
located on the boimdaxy of the domain, while the maxiimum aspect-ratio cells are located in the bisectors 
and in the middle of the domain. A manually fully coarsened 1089 (33 x 33) node grid is depicted in Fig.8.b. 
Although no natural manual semi-coarsening technique applies here, a horizontally semi-coarsened 2145 
node (33 x 65) coarse grid is built for comparison purposes (Fig.8.c). The coarsening algorithm resulted in 
a 2115 node coarse grid (Fig.8.d). It is again obvious that the semi-coarsening follows the direction normal 
to the stretching, each region being clearly separated by the bisectors. The fully-coarsened grid provided a 
convergence rate of 0.50, and 0.30 was achieved with the manually horizontally semi-coarsened grid (Fig.S.e). 
A linear type of convergence resulting in a convergence rate of 0.12 was achieved with the code semi-coarsened 
grid. It is interesting to note that, despite the similar number of nodes shared by the manually horizontally 
semi-coarsened grid and the code semi-coarsened grid, they provided different results, and therefore the good 
convergence rate of the code semi-coarsening technique cannot be attributed solely to the number of nodes 
on the coarse grid. 












5,2 Meltigrid experiments 

In this section, multigrid experiments axe explored in order to demonstrate the robustness of the algorithm in 
producing a sequence of grids that permit efficient MG convergence. The number of grids will vary according 
to the test-case. Two sweeps of the Jacobi relaxation are performed on each level and W-cycles are employed 
since they provide a better resolution of the coarse grid, resulting in better convergence rates. A structured 
Chebyshev and an unstructured test-case axe performed with both semi and semi/full-coaxsening techniques. 



a. 16641 Node Fine Grid. b. 8324 Node SC Grid. c. 6294 Node S/FC Grid. 



d. SC Region. 



e. FC Region. 



f. Resulting Convergence Histories. 
Figure 9: Multigrid Chebyshev Meshes - AR — 0.012. 


The Chebyshev test-case. A 16641 (129 x 129) node fine grid is constructed with a minimum aspect- 
ratio value of 0.012 and a maximum value of 81.50 (Fig.9.a). The semi-coarsening option provides a sequence 
of 7 grids comprising 16641, 8324 (shown Fig.9.b), 4329, 2289, 1211, 652 and 352 nodes, and the semi/full¬ 
coarsening technique a sequence of 6 grids comprising 16641, 6294 (shown Fig.9.c), 2976, 1077, 559 and 286 
nodes. The respective W-cycle complexities are equal to 11 and 6 WUs. The region where the algorithm 
performs the semi-coarsening is depicted nodewise in Fig.9.d, while Fig.9.e shows where the full-coarsening is 
applied. It is clear that the semi-coarsening is applied to the highly stretched element region as expected. The 
semi-coarsening technique results in a standard-like convergence rate of 0.15 (Fig.9.f). When used only with 
6 grids, this technique requires the coarsest grid to be converged completely, otherwise the process abruptly 
stalls at some low residual value. A convergence rate of 0.17 and a low complexity favor the semi/full- 
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coarsening technique. Yet, the convergence history displays a (slight) change of slope. This indicates that 
the method is sensitive to the quality of the triangulation of the coarse grids. Mesh-independent convergence 
is the purpose of this study, and is only truly achieved with the semi-coarsening technique. The slightly 
poorer type of convergence associated with the semi/full-coarsening technique may be explained by the 
quality of the triangulation of the coarse grid. Full-coarsening in non-stretched regions tends to deteriorate 
the relative difference of aspect-ratio between the highly and non-stretched regions. Moreover, the addition 
of a 7th grid, or even converging the coarsest level, does not change the convergence. 







Figure 10: Multigrid Unstructured - Full-Coarsening - AR = 3.7 x 10 


The unstructured test-case. In this case (Fig.lO.a), a grid-spacing Aj/ = 10“® on the body results in 
an average minimum aspect-ratio of 3.7 X 10“^. In Fig.lO.e and f are depicted the zoom of the right upper 
corner and of the wake region respectively in order to show the different type of stretched and non-stretched 
elements that appear in these meshes. A first sequence of 4 fully-coarsened meshes is manually constructed. 
The number of nodes for each level are: 19366, 4955, 1270 and 335. These meshes are depicted in Fig.lO.a 
to Fig.10.d. The complexity of a W-cycle is equal to 3.2 WUs. 
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a. 9983 Node SC Grid. 


b. 5189 Node SC Grid. 


c. 2724 Node SC Grid. 


d. 1717 Node SC Grid. 



9 in 9 10 


e. 1044 Node SC Grid. f. 589 Node SC Grid. g. Retriangulated Fine Grid. h. Original Fine Grid. 

Figure 11: Multigrid Unstructured - Semi-Coarsening - AR = 3.7 x 10“^. 

The second sequence is obtained with the semi-coarsening technique only. There axe 7 meshes that have 
19366, 9983, 5189, 2724, 1717, 1044 and 589 nodes (Fig.11.a to Fig.ll.f). The W-cycle complexity is equal 
to 12.5 WUs. The last sequence of meshes results from the semi/full-coarsening technique and provides 7 
meshes (Fig.l2.a to Fig.l2.e): they comprise 19366, 9594, 4708, 2325, 1391, 794 and 424 nodes, resulting 
in a 11 WU W-cycle complexity. SC and S/FC methods required all coarse point sets to be retriangulated 
using the Min-Max Delaunay variant. In order to maintain favorable convergence rates, it was found that 
the fine grid needed to be retriangulated according to the same technique. This can partially be explained 
by the quafity of the nestedness of all the grids as seen in Section 3. The fine grid is not depicted here for 
these last two sequences because it would appear similar to the original (Fig. 10.a). However, the difference 
between the original and retriangulated fine grids, mostly confined to wake regions, is illustrated in Fig.ll.g 
and h. 



a. 9594 Node S/FC Grid. 


b. 4708 Node S/FC Grid. 


c. 2325 Node S/FC Grid. 


d. 1391 Node S/FC Grid. 



Figure 12: Multigrid Unstructured - Semi/Pull-Coarsening - AR — 3.7 x 10“®. 

Converging the coarsest grid of the sequence of the fully-coarsened grids does not change the convergence 
rates equal to 0.80 (Fig.l2.f). This indicates that the use of an additional coarser grid would not change the 
convergence. Besides, the retriangulation of the entire sequence of the fully-coarsened grids does not change 
the convergence rate of the MG algorithm, whether or not the coarsest grid is converged. The semi/fully- 
coarsened and semi-coarsened grids provide a clear improvement with respect to the usual fully-coarsened 
grids with convergence rates equal to 0.23. The semi/fully-coarsened grids demonstrate a better behavior 
than in the Chebyshev case because they are very similar to the semi-coarsened grids. Indeed, since most 
of the nodes are concentrated in the highly stretched regions, the algorithm performs essentially as a semi¬ 
coarsening technique. This type of meshes is more similar to exponential-type meshes rather than Chebyshev 
meshes. 
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Figure 13: Significant Results. 
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Concluding remarks 

In Fig.13 are gathered the most significant results. They are separated in two difierent subsets. Curves 1 
and 2 represent the spectrum of convergences within which the other convergence histories must fit. Indeed, 
curve 1 shows the best convergence and curve 2 shows what is expected when the discretization subspaces 
are only node-nested. All other curves depict the convergence histories of the various test-cases that employ 
the semi-coarsening algorithm. The problem to be solved is the same for all test-cases, only the geometries of 
the discretized spaces differ. The results are straight lines with similar slopes that fall within the predicted 
range. The difference of slopes may be explained by two essential reasons. First, the boundary conditions 
of the structured and imstructured test-cases differ. It is not possible, due to the geometry, to transpose 
exactly the same boundary for both types. Then, it has been shown that the nestedness of the subspaces 
influences the quahty of the convergence. It cannot be expected that the unstructured grids be completely 
nested. On the other hand the quality of the triangulation per grid may also damage the convergence. 

In this paper, a new semi-coarsening algorithm relying on the discretization of the equation, which should 
enable flexible applications, has been introduced. Convergence rates for highly stretched unstructured meshes 
have been obtained similar to those for standard Cartesian structured non stretched meshes. Finally, linear, 
hence mesh independent, convergence rates have been demonstrated. The extension of these unstructured 
semi-coarsening techniques to the resolution of the Navier-Stokes equations is planned in the near future. 
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PRECONDITIONING OPERATORS ON 
UNSTRUCTURED GRIDS 


S.V. Nepomnyaschikh* 
March 14, 1996 


Abstract 

We consider systems of mesh equations that approximate elliptic 
boundary value problems on arbitrary (unstructured) quasi-uniform 
triangulations and propose a method for constructing optimal precon¬ 
ditioning operators. The method is based upon two approaches: (1) 
the fictitious space method, i.e., the reduction of the original problem 
to a problem in an auxiliary (fictitious) space, and (2) the multilevel 
decomposition method, i.e., the construction of preconditioners by de¬ 
composing functions on hierarchical meshes. The convergence rate of 
the corresponding iterative process with the preconditioner obtained 
is independent of the mesh step. The preconditioner has an optimal 
computational cost: the number of arithmetic operations required for 
its implementation is proportional to the number of unknowns in the 
problem. The construction of the preconditioning operators for three 
dimensional problems can be done in the same way. 


1 INTRODUCTION 

Let Ct C IR^ be a domain with a piecewise smooth boundary T which belongs 
to the class and satisfies the Lipschitz condition [18]. In the domain fl 

•Computing Center, Siberian Branch Russian Academy of Sciences, 6 Lavrentiev av., 
Novosibirsk, 630090, Russia. The work was partially supported by the ISF under contract 
NPB 000, the grant DRET 93/34/401, the Russian Basic Research Foundation grant 
93-01-01783 
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we consider the boundary value problem 


• ^ + «o(a:)u = f{x ), 

i,j=i 


X ^ D 


u 


(r) = 0 , X e To 


( 1 ) 


du 

— + a{x)u = 0, X e Ti 


where 

du ^ . du 

&- 

is the conormal derivative, n denotes the outward normal to F, and Fq is a 
union of a finite number of curvilinear segments, F = FqUFi, Fq = Fq- Here 
Fo denotes the closure of Fq. 

By Ff^(f2,Fo) we denote the subspace of the Sobolev space II^{D) 

H\D, Fo) = {ve H\D) I v{x) = 0, x € Fq} . 


cos(n, X,) 


We introduce a bilinear form a{u,v) and a linear functional l{v) as follows; 

du dv 


= X (_E »«(-) + X. cr(x)uu dx 


l{y) = / /(x)u dx . 

Jq 

Let us suppose that the operator coefficients and the right-hand side of prob¬ 
lem (1.1) are such that the bilinear form a{u,v) is symmetric, elliptic and 
continuous on H^{D,To) x ^^^(H,Fo), i.e., 

a(u, v) = a{v,u) Vu, v G Fo) 

"o||M||ffi(n) < a(u,u) < q;i||u||^i(q) Vu G H^{Q,To) 

and the linear functional l{v) is continuous on if^(0,Fo): 

l/(^x)l<allulUi(n) VuGLf'(0,Fo). 
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The generalized solution u G H^{D,,To) of problem (1.1) is, by definition, 
a solution to the projection problem [ 2 ] 

u e a{u,v) = l{v) Vn G To). ( 2 ) 

It is familiar that under these assumptions concerning a(u,v) and l(v) there 
exists a unique solution of problem ( 1 . 2 ). 

Let a positive parameter h be fixed (we always suppose that h is suffi¬ 
ciently small). Let 

M 

= r,- 

1=1 

be a triangulation of the domain D (0^ is assumed to be a closed set). We 
suppose that is a quasi-uniform triangulation [5], i.e., there exist positive 
constants li, and s which are independent of h and such that 

hh<ri<l2h, — <s, i = 

Pi 

where ri and /?,■ are radii of circumscribed and inscribed circles for the tri¬ 
angle r,-, respectively. We also assume that the triangulation boundary 
approximates T with an error (9(A^). If Fi = F, we suppose that C if 
Fo = F, we suppose that C D. If Fo 7 ^ 0 and Fi ^ 0 , we make the follow¬ 
ing assumption: points where the bou ndary cond ition changes should be at 
triangulation nodes, Fi C 0^ and Fq C (IR^ \ f2^). Part of F^ approximating 
Fo will be denoted by Fq, and that for Fi by Fj. For the triangulation 
we define the space Ff^(fl^) of real continuous functions which are linear on 
each triangle of and vanish at Fg. We extend these functions on 0 \ 
by zero. 

The solution of the projection problem 

e a{u\v^) = l{v^) \/v^ e Hh{D^) (3) 

will be called an approximate solution of problem ( 1 . 2 ). Aspects of approxi¬ 
mation of (1.2) by (1.3) have been thoroughly studied (see [5,14]); we do not 
consider them here. Each function G Hh{D^) is put in standard corre¬ 
spondence with a real column vector u G IR^ whose components are values 
of the function at the corresponding nodes of the triangulation Then 
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(1.3) is equivalent to the system of mesh equations 

Au = f 

{Au, v) = a{u\ v^) Vu\ G Hh{D^) (4) 

(/,u) = /(u") 

where and are the respective prolongations of vectors u and v] (/, v) is 
the Euclidean scalar product in IR^. 

The main goal of this work is to construct a symmetric positive definite 
preconditioning operator B for problem (1.4) so as to satisfy the inequalities 

ci{Bu,u) < (Au^u) < C2{Bu,u) Vu G IR^ (5) 

where positive constants Ci and C 2 are independent of h] the multiplication 
of a vector by B~^ should be easy to implement. 

The preconditioner B is constructed by using the method of fictitious 
space [10] in two stages. At the first stage, we pass from an arbitrary un¬ 
structured triangulation 0^ to an auxiliary structured non-hierarchical mesh, 
and at the second stage to a hierarchical mesh (a square mesh on a square 
containing the original domain 0). Note that the passage from an arbitrary 
triangulation to a structured mesh was earlier used in [11]. This paper in¬ 
cludes some development of [13] for the case of locally refined grids. Another 
technique for constructing the preconditioners on unstructured meshes was 
proposed in [8,9,10,17]. The construction of preconditioning operators on 
non-hierarchical grids was considered in [6]. 


2 REDUCTION TO A STRUCTURED 
MESH 

The preconditioning operator B in (1.5) is constructed on the basis of the 
lemma of fictitious space [11]. For convenience, we give this lemma here. 
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Lemma 2.1. Let Ho and H be Hilbert spaces with the scalar products 
(uq, Vo)ho (“5 '^)h, respectively. Let Aq and A be symmetric positive def¬ 
inite continuous operators in the spaces Hq and H: 

Ao: Ho ^ Ho, A: H ^ H. 

Suppose that R is a linear operator such that 

R: H Ho 

{AoRv,Rv)ho < cr(Av,v)h \/v e H 
and there exists an operator T such that 

T: Ho H, RTuq = uq 

ct{ATuo,Tuo)h < {AoUo,uo)ho V«o G i/'o 

where cr and ct are positive constants. Then 

CTiAo^Uo,Uo)Ho < {RA~^R*Uo,Uo)ho < Cr(Ao^Uo,Uo)ho € Hq ■ 

The operator R* is adjoint to R with respect to the scalar products (mq, '^ 0)^0 
and {u,v)h: 

R*: H-^Ho 
{R*Uo, v)h = {uo, Rv)ho • 


Note that for constructing and implementing the preconditioner, i.e., the 
operator RA~^R*, we only require the existence of the operator T. In our 
case, the role of the operator Ao is played by A of (1.4), and the role of the 
space Ho by Hh{Llh). In order to use Lemma 2.1, we construct a fictitious 
(auxiliary) space and the corresponding operators. To do this, we embed 
the domain 17 in a square 11. Let Ki denote the union of triangles in the 
triangulation which have a common vertex Zi, and let d,- be the maximum 
radius of circle inscribed in Ki. In the square 11, we introduce an auxiliary 
grid lift with a step size h such that 

h < min di. (6) 

2V2 i 
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Let us assume that h — I - 2 , where I is the length of sides of If and J 

is a positive integer. We denote the nodes of the grid 11^ by 

Zij J/j) ) J 0, 1, . . . , 

and the cells of II^ by Dij , 

Dij = {(x,y) I a:,- < a; < x^+i, yj <y < yj+i } 

U Dij. 

i,j—0 

Let denote the minimum figure that consists of cells Dij and contains 
C Q^', let S^ be the set of boundary nodes of Q^. We subdivide the 
set S'^ into two subsets Sq and S^ as follows: if 

Dij n To 0 

all nodes of Dij f! S^ are in Sq 

S^ = 5'^ \ 

Using cell diagonals, we triangulate and 11^; hereafter, the designations 
and 11^ refer to triangulations as well. Let Hh{Q^) be the space of real 
continuous functions which are linear on the triangles of and vanish at 
the nodes of Sq. It is the space Hh,{Q^) that will be used as the fictitious 
space in Lemma 2.1. 

We now define the projection operator R 

R: HhiQ^) ^ Hh{D^) 

the extension operator T 

T: ^ JL,(g") 

and an easily invertible operator in the space 

Let us begin with the operator R. For a given mesh function 

u’^iZij) e Hh{n^) 
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we define a function G as follows. Let zi be a vertex in the 

triangulation assume that zi G Dij. We put 

uHzi) = (TU'')(z,) = U\Z,i). (7) 

The function is equal to zero at nodes G Tq. 

Then, let us define the operator T. For a given function G 
we define a function U G The function U^ is equal to zero at nodes 

Zij G Sq. At the other nodes, U is defined as follows. If a cell Dij contains 
a certain vertex zi of the triangulation O^, we put 

v\Zii) = = u\z,). 

For each of the remaining nodes Zij G we find the closest vertex zi of 
the triangulation (if there are several closest vertices, we can choose any 
of them) and put 

U\Zi,) = (Tu'')(Zi,) = u\zi). 

Finally, in the space Hh{Q^) we define the operator Aq\ 

{AqU,V)=I {{sjU\s/V^)AU^-V^)dxdy e Hh{Q^). (8) 

where U^ and are the respective prolongations of the vectors U and V. 

Theorem 2 . 1 . There exist positive constants C 3 and C 4 , independent of 
h, such that 

C3{A~^U, u) < {RAq^R*U, u) < C4(A“^U, u) Vu G IR^ . 

Here A, R and Aq are operators o/(1.4), (2.2) and (2.3), respectively; R* 
is the transpose of R (we hereafter use the same designation for an operator 
and its matrix representation). 

Proof. The theorem easily follows from Lemma 2.1, condition (2.1) and 
the familiar equivalence of -norms of finite-element functions in the spaces 
Hh{D,^), Hh{Q^) and the difference counterparts of these norms [14], 
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Remark 2.1. The implementation of the operator R is equivalent to the 
piecewise constant interpolation. It is easily seen that the number of arith¬ 
metic operations required for multiplying R or R* by a vector is proportional 
to the number of nodes in the mesh domain. 

Thus, the construction of a preconditioning operator on an unstructured 
triangulation is reduced to the construction of a preconditioning operator for 
Aq. The latter problem is considered in Section 3. 


3 FICTITIOUS SPACE AND MULTI¬ 
LEVEL DECOMPOSITION METHODS 

In order to find a preconditioning operator for Aq, we again use Lemma 2.1. 
Here the fictitious (auxiliary) space is which consists of piecewise 

linear continuous functions vanishing on the boundary ^H of the square H. 
Efficient preconditioning operators in are well known; in particular, 

we may use the BPX preconditioner [4]. To do so, we use the following 
construction. 

We divide the domain H \ into two non-intersecting subdomains such 
that 

n\ = To u Fi, Go n Gi = 0 

(9) 

dGo ndn = To, dGi n = f i. 

According to (3.1), we represent the triangulation H^ \Q^ &s a union of two 
non-overlapping parts: 

n^\g^ = G^ u G^ 

where Gq and G^ are mesh approximations of the domains Go and Gi, re¬ 
spectively. Further, we denote 

G = H U Fi U Gi, G'^ = U G^ 

Hh{G^) finite-element space of functions vanishing on 5G^. We consider in 
the sequence of grids 

n;,n‘,...,n; = n* 
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PRECONDITIONERS ON UNSTRUCTURED GRIDS 
with step sizes 

ho = I, hi = I ■ 2 ^,...,hj = h = l- 2~^ . 

We triangulate these grids and consider the corresponding fini te-element 
spaces 

CW^ c ...c = Hh{Il^). 

By we denote the nodal basis of the space I = 0,1,..., J. 

First, let us examine the case of Fi = F; accordingly, here S^ = S^. By 
we denote the restriction of the basic function onto Q^. We put each 
function U^ E Hh{Q^) in correspondence with a function E 

{ u\Zii) , Zi, e 

i 0, 

Define 

cyc/'‘ = i: Y, W'eHi.iQ’'). 

supp^TnQ^^0 

Theorem 3.1. There exist positive constants C 5 and Co, independent of 
h, such that 

C 5 {A~^u,u) < {RCfj^R*u^u) < co{A~^u,u) \/u E IR'^ . 

Proof. Let us define 

Rn: HhiD’^) Hh{Q^) 

to be an operator of restriction on Q^\ 

{RNU^){Zij) = U\Zi,) yZijEQ^. 

If we subdivide the nodes of H'' into two groups: (1) the nodes of (includ¬ 
ing those of 5'^), and (2) the remaining nodes, then we obtain the following 
matrix representation for Rjn (see also [1]): 

Rn = {10) 
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PRECONDITIONERS ON UNSTRUCTURED GRIDS 


where / is the identity matrix corresponding to nodes of group (1), and O is 
the zero matrix corresponding to nodes of group (2). It is evident that 

< ||C/'‘||H.,n») V)7‘ € . 

By the theorem of extension of mesh functions [6], there exists the extension 
operator 

Tn: Hk{Q^) ^ 

uniformly bounded with respect to h. 

According to Lemma 2.1 and [4], there exist positive constants cj and cg, 
independent of h, such that 

U) < {RNCl,^Rrj,U, U) < cs{Aq^u,u) \/u 

where Aq is the operator of (2.3) and the definition of C^^ is 

J Ni 

Cn'U'" = E V£/‘ e . 

1=0 J=1 

Taking into account the explicit form of Rn, we complete the proof of The¬ 
orem 3.1. 

Then, let us examine the case of the Dirichlet problem, i.e.. To = F and, 
accordingly, Sq = S^. We define the preconditioner as follows: 

CB'u'‘ = j2 E (c‘, «!")!,(««) *!'* w" 

supp$TcQ'‘ 


Theorem 3.2. There exist positive constants cg and cw, independent of 
h, such that 

Cq{A~^u,u) < {RCff R*u,u) < cio(A“^u,u) Vu G IR'^ . 


Proof. In this case, the equivalence of the operators Aq and Cq easily 
follows from the multilevel technique [3,4,15,16] and can be done, for in¬ 
stance, by using quasi-interpolants from [12]. Then, from Theorem 2.1 we 
get the assertion of Theorem 3.2. 
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PRECONDITIONERS ON UNSTRUCTURED GRIDS 


Finally, we examine the case of mixed boundary conditions, i.e.. To 7 ^ 0 
and Fi 7 ^ 0 . We denote 

CM^U’^ = j2 E 

supp#cG'^ 
supp C\ ^ <Z) 


Theorem 3.3. There exist positive constants cn and C 12 , independent of 
h, such that 

cii{A~^u,u) < {RCff R*u,u) < ci 2 {A~^u,u) Vu G . 


Proof. The theorem is proved by using the argument of Theorem 3.2 
and then that of Theorem 3.1. Indeed, at the first step, let us ‘extend’ the 
Dirichlet boundary condition from Sq to the boundary of the triangulation 
n^. To do it, we consider finite element space Hh{G^) and define 

C5'U'‘ = i2 E VC/''€ 

supp$^‘'cG'> 

Then, according to Theorem 3.2, there exist positive constants Ci 3 ,ci 4 , inde¬ 
pendent of h, such that 

c.3||C'‘||?,.,o., < (CaU,U) < c,4||C/''|p„.,o») VC/'- e 

At the second step, define 

Rn,g: ^ HhiQ’^) 

as a restriction on from G^: 

iRN,GU^){Zij) = U\Zi^) yZi.eQK 


Then, from Lemma 2.1 we get 

ci5(Aq'C/, U) < {Rn,gCg"Rn,gU. U) < creiAg^U, U) 'iU'^ € H^iQ^) 
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where Ci 5 ,Ci 6 are in4ependent of h. Using again the explicit form of Rn^Gi 
we complete the proof of Theorem 3.3. 


4 LOCALLY REFINED GRIDS 

In this section we consider a triangulation of the domain Q 

M 

O' = 0 U 

i=\ 

and assume O' is regular but not quasi-uniform, i.e., there exists a constant 
s, independent of /i, such that 

— < s , i = 1,..., M 
pi 

where r,- and pi are radii of circumscribed and inscribed circles for the triangle 
Ti, respectively. It means that O' can be locally refined. For this triangulation 
O', we define the space Ff;, (0') of real continuous functions which are linear 
on each triangle r, of O'. For the sake of simplicity, we consider the Dirichlet 
boundary condition and assume that the functions from Hh,{DT) vanish at 

r'. 

If we introduce a uniform fictitious grid (J', then it is possible to modify 
the operators R and T from Section 2 for locally refined triangulation O', 
but realization of a preconditioner will be expensive. 

Let us embed the domain 0 in a square 11 and start with a coarse uniform 
grid IIq. We refine IIq several times 

it' it' 

The grid II' consists of cells Let Qq denote the minimum figure that 
consists of cells and contains O'. Denote by /q a set of indices (f,i) such 
that 

U f?’ 

(i,3)^k 
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We define grids Qi,Q 2 , ■ ■ ■ the following way. Denote by /; a set of indices 
{i,j) such that the cell D^j contains more than one vertex of the triangulation 
We divide Dfj and all neighboring cells (which have at least one common 
node with D^j) into four congruent sub cells by connecting the midpoints 
of the edges. Denote new cells by and a resulting grid by = 

0,1,..., which are the minimum figure that contains We stop this process 
when each cell contains no more than one vertex of Denote by Qj the 
final grid. 

Define a finite-element space Hh{Q^) as follows: 

HM’') = { E E E 1 4' € m.} 

We now define the projection operator R 

R: HhiQ’^) ^ 

the extension operator T 

T: ^ HhiQ^) 

according to the definitions from Section 2. 

Define a preconditioning operator in Hh{Qj) in the following way: 

Q'c/‘= E E E 

snppfb^°'>cQ>} ‘=° b-dehsupp^j^'+i^nD^PT^e 

for any U^ € Hh{Q’}). 

Theorem 4.1 There exist positive constants cn and Cis, independent of 
h, such that 

ci 7 (A-^u, u) < {RCr^R*u, u) < cis{A-\ u) Vu e . 


Proof. In this case, we again use the equivalence of If^-norms of finite- 
element functions in the spaces IIh{D^), IIh{Q^) and the difference counter¬ 
parts of these norms and the multilevel technique. 
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MULTIGRID METHODS FOR EHL PROBLEMS 


Elyas Nurgat and Martin Berzins 
School of Computer Studies, University of Leeds 
Leeds, LS2 9JT, UK 


INTRODUCTION 

In many bearings and contacts, forces are transmitted through thin continuous fluid films which 
separate two contacting elements. Objects in contact are normally subjected to friction and wear 
which can be reduced effectively by using lubricants. If the lubricant film is sufficiently thin to 
prevent the opposing solids from coming into contact and carries the entire load, then we have 
hydrodynamic lubrication, where the lubricant film is determined by the motion and geometry of 
the solids. However, for loaded contacts of low geometrical conformity, such as gears, rolling 
contact bearings and cams, this is not the case due to high pressures and this is referred to as 
Elasto-Hydrodynamic Lubrication (EHL) (ref. 1). In EHL, elastic deformation of the contacting 
elements and the increase in fluid viscosity with pressure are very significant and cannot be ignored. 

Since the deformation results in changing the geometry of the lubricating film, which in turn 
determines the pressure distribution, an EHL mathematical model must simultaneously satisfy the 
complex elasticity (integral) and the Reynolds lubrication (differential) equations. The nonlinear 
and coupled nature of the two equations makes numerical calculations computationally intensive. 
This is especially true for highly loaded problems found in practice. One novel feature of these 
problems is that the solution may exhibit sharp pressure spikes in the outlet region (ref. 1). 

To this date both finite element and finite difference methods have been used to solve EHL 
problems with perhaps greater emphasis on the use of the finite difference approach. In both cases, 
a major computational difficulty is ensuring convergence of the nonlinear equations solver to a 
steady state solution. Two successful methods for achieving this are direct iteration and multigrid 
methods. 

Direct iteration methods (e.g Gauss Seidel) have long been used (e.g Hamrock and Dowson 
(ref. 2)) in conjunction with finite difference discretizations on regular meshes. Perhaps one of the 
best examples of the application of such methods is the recent Effective Influence Method of 
Dowson and Wang (ref. 3). Multigrid methods have also been used with great success by Venner 
(ref. 4) and Venner and Lubrecht (ref. 5) with a good summary being given by Venner (ref. 6). 

As both these finite difference discretization based approaches appear to provide an efficient 
way of solving EHL problems, it is important to understand their relative merits. This paper is a 
first attempt at providing such an understanding in the context of EHL point contact problem, 
(contact of two spheres), in which the contact zone is a point and an ellipse or circle for unloaded 
and loaded dry contacts respectively. Since the film thickness and the contact width are generally 
small compared to the local radius of curvature of the two surfaces, the reduced geometry of the 
surfaces in the contact area can be accurately approximated to the contact between a paraboloid 
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and a fiat surface. 

The layout of the remainder of this paper is as follows. In section 2 we introduce the form of the 
equations to be solved. The Effective Influence Newton Method is described in Section 3 while 
Section 4 describes the Multigrid method to be used. Sections 5 and 6 describe the test problems 
to be used in the comparison between the two methods and compare the performance of the two 
methods. Section 7 concludes the' paper with an argument of the two methods and suggests some 
future research directions. 


GOVERNING EQUATIONS 


The Mathematical model describing the isothermal axisymmetric EHL circular contact problem 
consists of three equations. The Reynolds Equation relates pressure, P, to geometry of the gap, the 
film thickness, H, and velocities of the running surfaces. 


d 


dP' 


- / 9P\ d 
~ dx dy dy 


d{pH) 

dx 


= 0, x,y e [-3.5,1.5] X [-2,2] 


( 1 ) 


with the cavitation condition P >0 and P = 0 on boundaries. The function e = (pH^)f{r]X) 
depends on viscosity, r]{P), density, p{P), and film thickness, H{x,y). The remaining terms are 
given by: 

f 1 4 ..MEfeT, if P > 0 

P = \ , , (ref. 6 ); 


1 otherwise 

r] - exp {^[-1 + (1 + ^Py]} , (ref. 6 ); 

Ph is the maximum Hertzian pressure given by ph = ; 

a = pressure viscosity coefficient, 2 = 0.68 is the pressure viscosity index; 

A = s-nd Pq = 1.98 X 10® are constants; 

/i = 5.8 X 10“^° and u = 1.68 x 10“® are empirical constants; 

L and M are the Moes (ref. 6 ) dimensionless material and load parameters, respectively. For 
lightly loaded problems ph, which is a function of M and L, is about 0.5 GPa. Moderately loaded 
problems have ph in the range of about 1 GPa. 

The Film Thickness Equation, H(x,y), computes the elastic distortion of the surfaces caused by 
the pressure in the film and is written as: 


H{x,y) 


x^ y^ 2 


/ OO PO 

-00 J—c 


P{x ,y) dx dy 

\J{x-x'y + {y-y'y 


( 2 ) 


where Hqq is a constant. 

The final equation is the Force Balance Equation which ensures that the integral over the 
pressure balances the external applied load: 

/ OO poo 

I P{x, y) dx dy = External Force. 

-OO J— OO 

The nondimensionalisation employed allows the external force to be scaled to (27r)/3. 


( 3 ) 
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Finite Difference Discretization of Governing Equations 


The focus of this study is on the iterative solution methods for the nonlinear equations and so 
in order to allow comparison with existing results we shall follow most EHL studies and use a 
regular mesh. The governing equations are discretized on a regular rectangular grid with the 
direction of flow in the x direction and the mesh spacings hx and hy in the x and y directions, 
respectively. Due to symmetry, only half the domain is used in the y direction. Reynolds 
Equation (1) is discretized at each non boundary mesh point ((f — l)hx + Xa, {j — l)hy — yc) 

where x,y ^ [xo, X},] x [—yc, yd, using central and backward differencing to get, (ref. 6), 

~ hx{pijHij — pi-ijHi-ij) = 0 (4) 

where i j, denote the values of e at the intermediate locations midway between 

meshpoints. 

The discretized film thickness equation (2) at a point {i,j) is given by: 


x^ 

Hij = Hoo + -^ + -y + d, 





where Hoo is a constant and j is the elastic deformation of the material due to the applied load as 
defined below. 


Elastic Deformation Integral 

The elastic deformation on the surface of a solid depends on the representation of applied 
normal pressures. The simplest procedure is to divide the pressure distribution into rectangular 
blocks of uniform pressure. The elastic deformation at a point (x,y), dx,y, due to the uniform 
pressure over the rectangular area 2a2b is given by (ref. 6) : 

=^4fr . . (6) 

TT J-b J-a y^(x - XiY + (y - J/l)^ 

If the entire domain is divided into equal rectangular areas, then from Dowson and Hamrock 
(ref. 7), the elastic deformation at a point {i,j), dij, due to contributions of all rectangular areas of 
uniform pressure is given by: 




TT 


n TTlx '^y 


^;=l 1=1 


(7) 


where m = |i — A:| + l,n = \j — l\ + 1, rrix and riy are the maximum number of points in the x and y 
directions, respectively. The coefficients K.m,n are given by: 



\Xp+y/y'^+X^J 


+ la:,! In 


/ yq + V^l+Vq \ 
\yp+^/^i+yl) 
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where 

Xj, = Xi-Xk + ^ X, = Xi-Xk-^ yp=yj-yi + ^ y, = y • - y, - ^ . 

One advantage of a regular mesh is that the m^Uy coefficients need only be calculated once and 
stored. In contrast, on an irregular mesh it is necessary to store m^Uy coefficients for each mesh 
point. 

The force balance equation (3) determines the value of the integration constant Hqq and is 
discretized as follows: 


TTlx 


hxhy ^ ^ PiJ 


=1 j=l 



( 8 ) 


The system of equations (4), (7) and (8) thus constitutes a system of integro-differential 
equation s. The initia l pressure distribution is given by the Hertzian pressure profile, (ref. 6). That 
is P = — x'^ — y^ if + y^ < 1 otherwise P = 0. 


EFFECTIVE INFLUENCE NEWTON METHOD, [ref. 8] 


For EHL problems, when Newton’s method is used, the discretized nonlinear equation is 
linearized and solved using Gaussian elimination or an iteration method. Gaussian elimination may 
be used if the dimension of the coefficient matrix, Jacobian matrix, of the linear system is small. 
For EHL problems, a full Jacobian matrix is required because the elastic deformation at one point 
is determined by the pressure distribution over the entire grid. For a mesh of m^, Uy points, this 
results in an often prohibitively large dense system of rrixny equations. It is thus essential to seek 
computationally less expensive methods. 

The Effective Influence Newton Method developed by Wang (ref. 8), to solve EHL problems, is 
a variant of Newton’s method for solving nonlinear equations. This method employs the notion of 
effective influence to determine the contribution from elastic deformation in the solution of the set 
of approximate linear equations used in Newton’s formulation of the EHL problem. The elastic 
deformation at a point (*,i) is and must be determined by the pressure distribution over the entire 
domain, though the contribution decreases radially outwards. However, when obtaining the 
solution of the linearized Reynolds equation by Newton’s method, pressures not close to the point 
(f,j) can be ignored. i 

The elastic deformation at a point (i,j) due to a rectangular area of uniform pressure at some 
other point is strongly influenced by the distance between the two points. This enables us to define 
an eff'ective influence region such that only the pressures within the. region need to be considered 
when solving the approximate linearized Reynolds equation. This results in a banded, rather than 
full, Jacobian matrix, thus reducing the computational work involved in the EHL calculation. 

Suppose P is an approximation to the true solution P, then,at a point Li^j = L{P_)ij ^ 0 

and Zjj = L{P_)ij = 0. Taylor’s theorem gives: 





dLi,j 


1=1 k=i 


APk,i + 0{{APY) 


(9) 
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where Lij is the discretized Reynolds equation (1) at the point [xi^yj). 

If (m,) and {nj) are the number of effective points, from the point in the x and y 

directions, respectively, then the Effective Influence Newton's formula is of the form: 


j+nj i+mi 

E E 


l=j — nj k=i—mi 


dPk,i 


k,l + Lij — 0 . 


( 10 ) 


The simplest form of the Effective Influence Newton’s method makes use of five adjacent nodal 
points in linearizing the original Reynolds Equation. This is the method employed by Dowson and 
Wang (ref. 3) in solving the EHL problems and is of the form: 


AP._,, 


dPi 


-Ij 


+ ^AP,„+ 


9P, 


dPi. 


AP 


dLi 


+i,j 


i+l,j 




■'UJ 


dPi 


ii-l 


A jjnew ^ 

^ 'b-1 fi p.: 


A 


( 11 ) 


For a constant j, equation (11) results in a tridiagonal system of equations which are solved 
simultaneously using I-line relaxation, provided that AP^jZi and APfJ^i are known. In every 
iteration the correction term AP, j is evaluated on the entire grid. Having obtained AP, a new 
approximation P,-j to Pij is computed on the entire grid using: 

Pi, = Pi,-WAPij ( 12 ) 

where IT is a damping factor in the range 0.09 to 0.2. 

The new values of pressure are then used to calculate the elastic deformation, dij, and the film 
thickness constant, Poo, of the film thickness equation (5). Poo is updated using the force balance 
equation (8) and is given by: 


27r 

Poo ~ Poo o( hxhy ^ ^ ^ ) Pifi) 

^ i=l i=i 


(13) 


where c is a small constant taken, here as 10 ^. 

The technique employed to analyze the convergence of the solution is based on the change in 
the solution from one iteration to the next. Thus, the ERROR on the iteration is given by: 


ERROR = 


E rrix I pk pk—1 1 

E rrix rpriy pf. 


(14) 


and the iteration is terminated when ERROR < TOL, where TOL is a user supplied tolerance. 
The results of Dowson and Wang (ref. 3) and (ref. 8) show that the method works well for many 
different types of EHL problems. 


MULTIGRID METHOD 

The use of multigrid methods in solving EHL problems is relatively new. This method was 
introduced into the field of Tribology by Lubrecht (ref. 9), who through his extensive work has 
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made multigrid techniques an important technique for solving EHL problems. The use of 
multigrids for solving EHL line and point contact problems has been described by Venner (ref. 6). 

The concept of multigrid iteration depends on the asymptotic nature of errors associated with 
iterative schemes and how the schemes reduce these errors. Smooth error components associated 
with low frequencies are hardly reduced with the classical iterative schemes, thus resulting in a long 
time to converge. The opposite is true for error components with wavelength of the order of the 
meshsize. However, low frequency error components can be adequately represented on coarser 
grids. In a multilevel solver, which makes use-of a series of coarser grids, each error component is 
solved until the component becomes smooth when the procedure is switched onto- a coarser grid. 


Full Approximation Scheme 

FDMG Multigrid Software of Gareth Shaw (ref. 10) is used as a starting point for implementing 
the multigrid technique. FDMG employs Multigrid Full Approximation Scheme (FAS) to solve 
nonlinear systems of partial differential equations using either V or W coarse grid correction cycle. 
Jacobi or Gauss-Seidel iterative method can be used as a smoother. The option for the type of 
restriction is either injection or full weighting. 

EHL problems are nonlinear, thus when using multigrids the standard Correction Scheme can 
not be used; instead the Full Approximation Scheme must be used. In the cavitation region, in 
which negative pressures are computed by the solver, the Reynolds equation is not valid and the 
computed pressures are set to zero in the standard manner (ref. 6). This is treated with the 
multigrid method by using injection near and in the cavitational region when transferring the 
residual and solution to the coarse grid. Full weighting is used in the remaining part of the domain. 
The elastic deformation and the force balance equation gets updated on each grid using the 
updated pressure values. The only substantial modification to FDMG has been to take symmetry 
boundary conditions and cavitation into consideration. The main difference from the scheme of 
Venner (ref. 6) is that he uses a combination of Jacobi and Gauss Seidel rather than the Gauss 
Seidel scheme used here. 


Relaxation 

The solution for the isothermal point contact problem is obtained by I-line relaxation due to 
strong coupling in the direction of flow, x direction. The discrete equations are solved 
simultaneously on a line of points, sweeping across the grid only in the positive y direction due to 
symmetry. On each line of points, the Effective Influence Method is employed, as described above, 
and a tridiagonal system of equations is solved. The criterion for convergence are based on 
comparing the solutions on two grids with meshsize h and H = 2h. Thus the error, ERR(h,H), as 
used by Venner (ref. 6) to measure convergence is given by: 

rrix '^y 

ERR(h,H) = Kh,Y,Y.\pZ - ■ (15) 

J=1 j=l 
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TEST PROBLEM ONE 


This test problem, which appears in Wang (ref. 8), is solved on a single 151 by 81 grid of 
domain {{x,y) : —3.5 < x < 1.5,—2.0 < y < 2.0}. For this moderately loaded problem, the 
values of Moes (ref. 6) dimensionless parameters are M = 99 and L = 16. This in turn gives 
A = 2.397494 x 10“^. The maximum Hertzian pressure, ph, at this load is 1.21 GPa if 
a = 2.205645 x 10“®. The equivalent Hamrock and Dowson’s (ref. 11) dimensionless parameters 
with U fixed at 5.6102 x 10“^^ are W = 3.4125 x 10“® and G = 4865. 

This problem is solved by using the Effective Influence Newton method for 1500 iterations. 
Every 50 iterations the minimum, Hmin, and central, Hcent, film thickness is recorded. Table 1 
shows Hcent and Hmin together with the equivalent minimum film thickness of Hamrock and 
Dowson, HDHmin. The minimum and central film thickness achieved by Wang (ref. 8) after 100 
iterations is 0.28827 x 10“^ at (I,J)=(113,24). Figure 1 shows the profiles of the pressure and film 
thickness along the x-axis. The pressure spike near the outlet is an often observed feature of EHL 
solutions. 


Its 

Hcent 

Hmin @ (I,J) 

HDHmin 

RMSRES 

SumP 

ERROR 

50 

0.2679E+00 

0.1170E+00 (126, 1) 

0.3476E-04 

0.171E-02 

0.9529 

0.144E-01 

100 

0.1316E+00 

0.5548E-01 (113,18) 

0.1648E-04 

0.158E-02 

1.7756 

0.616E-02 

150 

0.1505E+00 

0.7472E-01 (111,19) 

0.2219E-04 

0.149E-02 

2.0660 

0.106E-02 

200 

0.1683E+00 

0.8592E-01 (111,19) 

0.2552E-04 

0.143E-02 

2.1125 

0.294E-03 

250 

0.1787E+00 

0.9148E-01 (111,19) 

0.2717E-04 

0.137E-02 

2.1109 

0.234E-03 

300 

0.1849E+00 

0.9366E-01 (114,18) 

0.2782E-04 

0.131E-02 

2.1052 

0.171E-03 

350 

0.1891E+00 

0.9504E-01 (114,18) 

0.2823E-04 

0.126E-02 

2.1017 

0.123E-03 

400 

0.1922E+00 

0.9610E-01 (114,18) 

0.2854E-04 

0.122E-02 

2.0996 

0.931E-04 

450 

0.1946E+00 

0.9689E-01 (113,18) 

0.2878E-04 

0.117E-02 

2.0984 

0.739E-04 

500 

0.1965E+00 

0.9753E-01 (113,18) 

0.2897E-04 

0.113E-02 

2.0977 

0.605E-04 

750 

0.2024E+00 

0.9949E-01 (113,18) 

0.2955E-04 

0.947E-03 

2.0958 

0.283E-04 

1000 

0.2053E+00 

0.1004E+00 (113,18) 

0.2983E-04 

0.795E-03 

2.0952 

0.163E-04 

1500 

0.2082E+00 

0.1013E+00 (113,18) 

0.3008E-04 

0.583E-03 

2.0947 

0.702E-05 


Table 1: Test Problem One on a single 151 by 81 grid, M=99 & L=16 


Convergence Criteria. Table 1 also shows the errors, associated with the solution, from which 
the accuracy of the solution can be analyzed. If the convergence criteria are based, as in Wang, see 
equation (14), (ref. 8), on the change in the solution from one iteration to the next, labelled 
ERROR in Table 1, then the solution has converged to the order of 10“^. After 100 iterations the 
solution has converged to the order of 10“^ and the corresponding error value found by Wang 
(ref. 8) is 0.182 x 10“^ on the same grid. 

The sum of the pressures over the entire grid, labelled SumP in Table 1, also suggests that the 
iteration is converging as the sum of pressures on the final iteration is converging towards 2.0943, 
thus obeying the force balance equation (8). 
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However, if the convergence is based on the Root Mean Square Residual, labelled RMSRES in 
Table 1, then it can be said that the solution may not have completely converged. The reason for 
this is due to the nature of the Reynolds equation. The coefficient e of the Reynolds equation plays 
a vital role in the solving of these equations. The pressures in the contact region, < 1, are 

larger than those in the non contact region. This makes the coefficient e vary by several orders of 
magnitude over the computational domain. Consider the case along the line of symmetry, y=0. In 
the contact region e is very small ranging from 10~^ to 10“^, whereas in the non contact region e 
varies from 10“^ to 10^ as can be seen from Figure 2. Thus, when e is very small the film thickness 
derivative part of the Reynolds equation dominates, whereas when e is large the contribution from 
the film thickness derivative part is minimal. Figure 2 also shows that the residuals are between 
two and four orders of magnitude smaller in the contact region than in the inlet and outlet regions. 

Y 



- 3.00 - 2.00 - 1.00 0.00 1.00 


Figure 1: Pressure and Film profiles along y=0. Test Problem One. 
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)0 


Bllltl 




Its 

Hcent 

Hmin @ (I,J) 

RMSRES 

SumP 

ERR(4,3) 

10 

0.187E+00 

0.140E+00 (80, 1) 

0.34945E-02 

1.6123 

0.8557E-02 

20 

0.109E+00 

0.618E-01 (94,31) 

0.32213E-02 

2.1197 

0.1654E-02 

30 

0.143E+00 

0.772E-01 (95,30) 

0.31004E-02 

2.1049 

0.1021E-02 

40 

0.157E+00 

0.831E-01 (95,30) 

0.30013E-02 

2.0965 

0.6970E-03 

50 

0.166E+00 

0.864E-01 (96,29) 

0.29144E-02 

2.0924 

0.5218E-03 

60 

0.172E+00 

0.887E-01 (96,29) 

.0.28358E-02- 

2.0907 

0.4102E-03 

70 

0.177E+00 

0.907E-01 (96,29) 

0.27631E-02 

2.0889 

0.3278E-03 

80 

0.181E+00 

0.922E-01 (96,29) 

0.26951E-02 

2.0874 

0.2913E-03 

90 

0.184E+00 

0.934E-01 (96,29) 

0.26307E-02 

2.0864 

0.2887E-03 

100 

0.186E+00 

0.944E-01 (96,29) 

0.25695E-02 

2.0858 

0.2837E-03 

150 

0.194E+00 

0.977E-01 (96,29) 

0.22968E-02 

2.0853 

0.2559E^03 

200 

0.199E+00 

0.996E-01 (95,29) 

0.20632E-02 

2.0865 

0.2272E-03 

250 

0.202E+00 

O.lOlE+00 (97,28) 

0.18582E-02 

2.0878 

0.2026E-03 

300 

0.204E+00 

0.102E+00 (97,28) 

0.16774E-02 

2.0888 

0.1826E-03 

350 

0.206E+00 

0.102E+00 (97,28) 

0.15188E-02 

2.0896 

0.1661E-03 
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It is not possible to use this mesh with the FDMG code, which requires the meshsize on level k 
to be given by 2* — 1. Instead meshes between 129 by 129 and 17 by 17 are used with FDMG. The 
results are shown in Table 2 and show broad agreement between the two methods. 


TEST PROBLEM TWO 

This test problem, which appears in Venner (ref. 5), is solved on a single 129 by 129 grid and a 
multigrid where the finest grid is 129 by 129 and the coarsest grid is 17 by 17. Due to symmetry, 
only the nodes in the positive y direction are used. For this lightly loaded problem, the values of 
Moes dimensionless parameters are M = 20 and L = 10. This in turn gives A = 0.2. The maximum 
Hertzian pressure, phi at this load is 0.58 GPa if a = 1.7 x 10“®. The equivalent Hamrock and 
Dowson’s dimensionless parameters with U fixed at 1.0 x 10“^^ are W = 1.8915 x 10”^ and 
G = 4729. 

This problem was solved using 300 multigrid V-cycles with the results recorded every 10 
iterations as shown in Table 3. The corresponding entries, from a single grid for 1500 iterations 
recorded every 100 iterations, are shown in Table 4. 


Its 

Hcent 

Hmin @ (I,J) 

RMSRES 

SumP 

ERR(4,3) 

10 

0.444E+00 

0.387E+00 (84 , 1) 

0.3368E-02 

1.6055 

O.lOOE-01 

20 

0.246E+00 

0.158E+00 (97 ,29) 

0.3038E-02 

2.1670 

0.455E-02 

30 

0.349E+00 

0.225E+00 (99 ,27) 

0.2849E-02 

2.1304 

0.160E-02 

40 

0.380E+00 

0.236E+00 (99 ,26) 

0.2700E-02 

2.1080 

0.854E-03 

50 

0.400E+00 

0.243E+00 (99 ,26) 

0.2569E-02 

2.1081 

0.651E-03 

60 

0;417E+00 

0.251E+00 (100,25) 

0.2450E-02 

2.1090 

0.558E-03 

70 

0.429E+00 

0.257E+00 (100,25) 

0.2341E-02 

2.1075 

0.468E-03 

80 

0.439E+00 

0.261E+00 (100,25) 

0.2239E-02 

2.1056 

0.411E-03 

90 

0.447E+00 

0.265E+00 (100,25) 

0.2143E-02 

2.1039 

0.361E-03 

100 

0.454E+00 

0.268E+00 (101,24) 

0.2053E-:02 

2.1024 

0.310E-03 

120 

0.464E+00 

0.272E+00 (100,24) 

0.1888E-02 

2.1000 

0.237E-03 

140 

0.472E+00 

0.275E+00 (100,24) 

0.1740E-02 

2.0981 

0.182E-03 

160 

0.478E+00 

0.278E+00 (100,24) 

0.1607E-02 

2.0966 

0.139E-03 

180 

0.483E+00 

0.280E+00 (100,24) 

0.1489E-02 

2.0956 

0.113E-03 

200 

0.487E+00 

0.281E+00 (100,24) 

0.1384E-02 

2.0949 

0.935E-04 

250 

0.494E+00 

0.284E+00 (100,24) 

0.1173E-02 

2.0938 

0.629E-04 

300 

0.499E+00 

0.286E+00 (100,24) 

0.1028E-02 

2.0933 

0.451E-04 


Table 3: Test Problem Two solved using multigrid, 129 by 129, M=20 &: L=10 
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Its 

Hcent 

Hmin @ (I,J) 

RMSRES 

SumP 

ERROR 

10 

0.1058E+01 

0.9588E+00 

(107, 1) 

0.3103E-01 

2.2451 

0.138E-01 

100 

0.5642E-F00 

0.4797E-b00 

(75 . 1) 

0.2430E-02 

1.4367 

0.583E-02 

200 

0.2143E+00 

0.1225E+00 

(100,27) 

0.2215E-02 

2.0141 

0.289E-02 

300 

0.3136E+00 

0.1999E+00 

(98 ,28) 

0.2101E-02 

2.1684 

0.567E-03 

400 

0.3585E-F00 

.0.2260E+00 

(100,26) 

0.2017E-02 

2.1243 

0.391E-03 

500 

0.3780E-h00 

0.2329E-F00 

(99 ,26) 

0.1945E-02 

2.1089 

0.247E-03 

600 

0.3932E+00 

0.2395E+00 

(99 ,26) 

0.1879E-02 

2.1067 

0.'l94E-03 

700 

0.4062E-F0b 

0.2455E+00 

(100,25) 

0.1818E-02 

2.1047 

0.163E-03 

800 

0.4167E-t-00 

0.2503E-I-00 

(100,25) 

0.1761E-02 

2.1028 

0.137E-03 

900 

0.4253E+00 

0.2543E+00 

(100,25) 

0.1707E-02 

2.1014 

0.117E-03 

1000 

0.4326E4-00 

0.2577E+00 

(100,25) 

0.1656E-02 

2.1004 

0.102E-03 

1500 

0.4576E+00 

0.2692E+00 

(100,24) 

0.1431E-02 

2.0976 

0.585E>04 


Table 4: Test Problem Two solved on a single 129 by 129 grid, M=20 &: L=10 


Results. The values obtained after 1500 iterations on a single grid, shown in Table 4, for the 
central, labelled Hcent, and minimum, labelled Hmin, film thickness is achieved using 120 multigrid 
iterations as shown in Table 3. Thus 1500 single grid iterations correspond to about 120 multigrid 
iterations. For this problem, Venner (ref. 6) achieved 0.502 and 0.349 for Hcent and Hmin, 
respectively, using a grid of {{x,y) : —4.5 < x < 1.5,—3.0 < y < 3.0}. If convergence is based on 
the sum of pressures on the entire grid, labelled SumP, then the value obtained using a multigrid 
method is slightly better than that obtained using a single grid method. Although the change in 
solution from the finest grid, 129 by 129, and the grid just above it, labelled ERR(4,3) in Table 3, 
and the change in solution from one iteration to next on a single grid, labelled ERROR in Table 4, 
are evaluated differently, they both seem to suggest that the solution has converged to the order of 
10~^. Venner’s results quote a value of ERR(4,3) of 0.122. The relative computation times on a 
SGI R4400 workstation for the two methods on this problem are 8:00:00 on a single grid for 1500 
iterations and 7:15:00 for 300 multigrid V-cycles. The multigrid method thus provides a means of 
obtaining solutions with greater efficiency. One potential area of difficulty with the multigrid 
method is that if the coarsest multigrid cannot adequately represent the solution, then the method 
may exhibit convergence difficulties. 

Contour line plots of the film thickness and pressure showing the formation of side-lopes and 
the spike region are shown in Figures 3 and 4, respectively. The cavitated region is clearly shown 
on the right side of Figure 4 and is preceded by the pressure spike region which can be seen more, 
clearly in Figure 5. 
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Figure 5: 3D pressure profile on MG, 129 by 129, M=20 & L=10. 


CONCLUSIONS 

The numerical results shown in this paper demonstrate how even a relatively standard multigrid 
code may be used to speed up the solution of EHL problems. The combination of the Effective 
Influence Method and multigrid method, which are both effective on their own, also appears to 
work well. 

An outstanding issue concerns the treatment of convergence in EHL problems. From a practical 
engineering point of view it is the pressures and film thicknesses in the contact zone that are of 
interest and thus it is changes in these pressures which must tend to zero. The much larger 
residuals in the inlet region where the pressure is close to zero, though of potential cause of 
concern, may not influence the values of pressure in the contact region unduly. Furthermore, the 
Reynolds equation derivation is based on assumptions that are less valid in the inlet region. This is, 
however, an issue that needs to be further explored. 

One possible way of obtaining a better understanding of the relationship between the residual 
and the solution is to compute error indicators in conjunction with adaptive meshes probably using 
a hierarchy of regular mesh patches to resolve the steep gradients in the pressure. It is this 
approach that will be our future research in, this area. 
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SUMMARY 

In this paper we look at Krylov subspace methods for solving the transport equations 
in a slab geometry. The spatial discretization scheme used is a finite element method 
called Modified Linear Discontinuous scheme (MLD). We investigate the convergence 
rates for a number of Krylov subspace methods for this problem and compare with 
the results of a spatial multigrid scheme. 


INTRODUCTION 

Transport equations describe the scattering and re-scattering of particles such as 
neutrons in a nuclear reactor, or light and infra-red radiation in the atmosphere. 
These equations are important, not only in nuclear engineering, but also in the study 
of the effects of greenhouse gases on the climate. A particularly important, although 
simple, model is of a single slab; this leads to integro-differential equations in one 
spatial variable and one angular variable. Unlike elliptic partial differential equations, 
these equations are based on highly non-normal operators, and require special care 
in their numerical treatment, especially for the regimes of physical interest: strong 
scattering, and weak or no absorption. 

In the past decades there has been a great deal of work on numerical methods 
for large scale problems, such as partial differential equations. In this paper we focus 
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on two of them; multigrid methods and Krylov subspace methods, as well as their 
application to transport equations. 

In the past decade there has been an enormous development of Krylov subspace 
methods for non-symmetric and indefinite systems. These methods only require three 
operations to be available for their implementation: linear combinations, inner prod¬ 
ucts, and matrix-vector products. Of these, it is assumed that matrix-vector products 
are the most complex to compute. As a result they can be efficiently implemented 
on scalar, vector and parallel computers. 

These Krylov subspace methods that have been developed are all based on either 
the symmetric Lanczos, unsymmetric Lanczos, or Arnoldi methods for computing 
bases of Krylov subspaces. These include the CGS (Conjugate Gradient Squared) 
method, which is from the family of methods that uses the unsymmetric Lanc¬ 
zos method; the GMRES (Generalized Minimal RESidual) method, which uses the 
Arnoldi method; and LSQR (Least Squares/QR) approach, which uses the symmetric 
Lanczos method. 

In addition, Krylov methods allow the easy incorporation of preconditioners. For 
solving Ax = 6, a preconditioner is a matrix B, where Bu can be easily computed 
given a vector u and the system BAx = Bb is easier to solve than the original 
system. Usually this is understood as finding B such that BA is a well conditioned 
matrix. Suitable matrices B can obtained by a number of different means. If A is 
“diagonally dominant”, then B can be simply the inverse of the diagonal of A; other 
preconditioners are based on Gauss-Seidel or SOR iterations; another source is that 
of incomplete factorizations of sparse matrices, for example, ICCG, which combines 
incomplete Cholesky factorization with conjugate gradients. For a preconditioner to 
be incorporated into a Krylov subspace method, it is sufficient to use a routine to 
compute BAu for a given vector u by first computing v = Au and then using a routine 
to compute Bv. 

Another class of algorithms that has been extensively developed in the past decade 
are multigrid, or multilevel, algorithms. These have found a great deal of success in 
dealing with elliptic partial differential equations. Some multigrid methods have 
been developed for solving special cases of transport equations [5, 9, 10]. For one¬ 
dimensional problems, these can give exceptionally small convergence factors, and 
thus are extremely good methods [4, 5, 9,10]. The development of parallel software for 
these methods is very time consuming due to the relaxation schemes used. For more 
general problems, and for two and three dimensional problems, the more “generic” 
Krylov subspace methods may be more suitable. 

In this paper the usage of multigrid methods developed in [4, 5, 9] is investigated 
for the case of isotropic scattering with small but significant absorption. This case 
can lead to difficulties with the multigrid method given in [4, 5, 9], as is noted in 
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[ 6 ]. In [ 6 ] a modified algorithm is developed to handle the case with isotropic scat¬ 
tering; however here the Krylov subspace technique GMRES is used with the “pure 
scattering” multigrid algorithm to improve its performance and robustness. 


TRANSPORT EQUATIONS 


The description of the neutron transport problem is given in previous papers 
[1, 3, 5]. Eor steady state problems within the same energy group for the isotropic 
case (by isotropic we mean that the probability of scattering for the particles is the 
same for all directions), the transport equation in a slab geometry of slab width b 
becomes 




di' 

dx 


+ (Tti' = 



p/)dp/ + g(x,t/,), 


( 1 ) 


for X e (0, h) and p G [—1,1]. Here, ^’(x, p) represents the flux of particles at position 
X traveling at an angle 6 = arccos(p) from the x-axis; at dx, the expected number of 
interactions (absorptive or scattering) that a particle will have in traveling a distance 
dx; as dx, the expected number of scattering interactions; aa = at — as, the expected 
number of absorptive interactions; and q{x,i-i), the particle source. The boundary 
conditions prescribing particles entering the slab are 


^>(0,^) = 5'o(/i), ^{b,-i^) gM, 


( 2 ) 


for fi G (0,1). 

This problem is difficult for conventional methods to solve in two cases of physical 
interest: 


1 . q = as/at = 1 (pure scattering, no absorption). 

2. l/(Jt <C 6 (optically dense). 

In fact, as cTf ^ oo and 7 —> 1, the problem becomes singularly perturbed. 

In this paper, the spatial discretization is a special finite element method called 
the Modified Linear Discontinuous (MLD) scheme (described in the next section), 
which behaves well in the thick limit. 

In a previous paper this discretization has been solved by a multigrid algorithm 
[4]. This multigrid method was based on a two-cell red-black //-line relaxation [5] 
with convergence factors of order 0 ((l/crj/i)^) when ath 1 , and 0{{ath)^) when 
ath < 1 . 
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Note that these multigrid operators are non-symmetric. Thus if they are used to 
precondition a Krylov subspace method, it must be a non-symmetric method such as 
GMRES, CGS, or QMR. In this paper we focus on GMRES. 

DISCRETIZATION 


The angular discretization is accomplished by expanding the angular dependence 
in Legendre polynomials, and is known as the Sn approximation when the first N 
Legendre polynomials are used. This results in a semidiscrete set of equations that 
resemble collocation at N Gauss quadrature points, fij, j = with weights 

Wj, j = 1,..., A^. Since the quadrature points and weights are symmetric about zero, 
we reformulate the problem in terms of the positive values, fij, j = 1 ,... ,n, where 
n = N/2. We define and tpj = xj:{x,—fj,j) for j — l,...,n. The 

spatial discretization is accomplished by the MLD scheme, which uses elements that 
are linear across each cell and discontinuous in the upwind direction. In our grid 
representation, the variable denotes the flux of particles at position Xi in the 

direction fij {—/j,j). The nodal equations are 


and 


+ hk) + <it 


o-t 


hi 


J’ 


k=l 


= J + gt., 


k=x 


o-t hi 

Cl ,■ - 


+ + ^i,k) + 


k=i 






hi 


+ i = 7 E - ^t-ik) + 


k=\ 


y = 1,.. ., n, z = 1 ,..., m, with boundary conditions 


(3) 

(4) 

(5) 

( 6 ) 


C+i.i=CA (7) 

3 = 1,•••,«• 

In our model, x-_^i and x-_i are cell edges, Xi = -|- x-_i) is the cell center, 

and hi = x^_^_i — x-_i is the cell width, 1< i < m. Equations (3) and (5) are called 
balance equations and (4) and (6) are called edge equations. In block matrix form 
equations (3) — (7) can be written respectively as 

^i(C+i ~ tt-i) +C = 7-R(C + 1~)+gf ’ 
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2Bi{±^ , -) +.£+ , = iR{±"r. + 2±. -±._,) + £+p 


■-i+L JL.i ) ' X-i+i . -T_i •3-i+^ 

-ti+i) + t7 = +t7) + 


2Bi{±._,_ - ±. ) + 3 ^._i = lRit-_i + 23 ^+ - ) + £-_i 


tm+k=ll 


i — 1,... ,m. Here, 



Z^i/ 

0 


■ 1 ■ 

Bi = 



, and R = 

• 


0 

f^nl 


_ 1 _ 


Wi 


i 


(9) 

( 10 ) 

( 11 ) 

( 12 ) 


(13) 


where /ii, /Lt 2 ,..., ^„ are the positive Gauss quadrature points, wi,iU 2 -,... ,Wn are the 
Gauss quadrature weights, and is an n-vector: • 

In the computational grid, the inflow for positive angles is on the left, and the 
inflow for the negative angles is on the right of the whole domain. Figure 1 shows 
the computational domain with 2m + 1 spatial points and n angles. For a cell 
relaxation the inflows of each cell are assumed known. For a /w-line relaxation for the 
whole domain only the boundary conditions are assumed known. 

Consider cell i. In one-cell //-line relaxation cell, centers ^ and 07, together 
with the outflow variables 0” 1 and 0j]^i, will be updated using the following matrix 
equation: 

Aui = rhs} + rhs^ (14) 

where the matrix A is given by 


' I + 2Bi — ')R —2'fR —2Bi 'jR 

0 I — jR —qi? Bi 

Bi —jR I — jR 0 

qi? —2Bi — 2qi? 1 + 2Bi — qi? 


(15) 


with 

U.i = i 1 ) > 

iMi = (0 , BiyA_ , BiuJ^ 1 , 0) , 

and 

Solving this matrix equation corresponds to performing a //-line relaxation for one 
cell. To solve this system for all variables we consider the cells coupled together; thus. 
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After the discretizations are chosen for equations ( 1 - 2 ) the problem becomes one 
of finding the best methods for the solution of Ax = 6 , where A is a g x 9 matrix 
and X and h are vectors of size q. In these methods, iterative solutions of the form 
constructed, where G Kk{A,r°). Kk{A^r^) is the Krylov space 
of dimension k, where k < q and is defined as the span of r°, Ar°, ..., 

The basic conjugate gradient algorithm of Hestenes and Stiefel [2] for symmet¬ 
ric positive matrices minimizes the residual in the A~^ norm (||a;|U-i = 
over Kk{A,r^). After q steps, without roundoff errors, it zeros the residual. For 
nonsymmetric matrices this method does not work. 

In this paper the solution of non-symmetric discretizations is investigated. Thus 
we must consider other Krylov methods. In addition we investigate the use of multi¬ 
grid methods as preconditioners. 

Krylov subspace methods are based on either the symmetric or unsymmetric Lanc- 
zos methods, or the Arnoldi method, applied either to A or to a closely related ma¬ 
trix. The symmetric Lanczos and Arnoldi algorithms generate (in exact arithmetic) 
orthonormal bases for Kk{A,r°), while the unsymmetric Lanczos produces a pair of 
biorthonormal bases for Kk{A, r°) axid Kk{A'^,r°), respectively. In both cases the 
Lanczos methods produce a tridiagonal matrix that represents the original naatrix on 
the Krylov subspaces, while the Arnoldi method produces a He'ssenberg matrix that 
represents the matrix on the Krylov subspace Kk{A,r°). The unsymmetric Lanczos 
process is fast, but can suffer from numerical instability, known as breakdown. There 
are variants of these based on the look-ahead Lanczos algorithm, which is a stabilized 
version of the unsymmetric Lanczos method. 
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One of the most commonly used non-symmetric Krylov subspace solvers is GM- 
RES. This method minimizes the residual over all solution vectors of the form x° +p^ 
where lies in Kk{A,r°). 


MULTIGRID 

To illustrate the multigrid scheme we consider it in the form of two grid levels. We 
use the notation h to indicate a fine grid and 2h to indicate a coarse grid, although our 
grids are not really assumed to be uniform. Let denote the fine grid operator; 
the coarse grid operator; and and the interpolation and restriction operators, 
respectively. Let vi and 1/2 be small integers (e.g., = 1^2 = 1), which determine the 

number of relaxation sweeps performed before and after the coarse grid correction. 
Then one multigrid U(z/i, 1 ^ 2 ) cycle is represented (in two-grid form) by the following: 

1. Relax times on = /^. 

2. Calculate the residual — L^u^. 

3. Solve approximately 

4. Replace <— -f 

5. Relax V 2 times on = /^. 

The coarse grid operator, is defined as 

i"'- = . 

For the isotropic scattering the multigrid scheme was applied with regard to the 
spatial variable in [4, 5]. 

Figure 1 illustrates grid points on the fine grid and on the coarse grid. The inter¬ 
polation and restriction operators for our previous multigrid schemes for transport 
equations were defined in [4, 5]. The operator is given in (15). The coarse grid 
operator has the same form as but on the new grid. 

NUMERICAL RESULTS 

The numerical results presented here are for the isotropic transport equations, 
both with and without absorption. The methods used are mostly the multigrid 
method of [4, 5, 9] for isotropic transport equations without absorption, by itself, and 
this method used as a preconditioner for GMRES. The methods were implemented 
using the Meschach matrix library in C [12] and were run on a Sun SPARC 20. The 
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coarse ,112 «»• /■i/2 / MU ««» 

grid 41 


3 


y.i 
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L_L_L 


L_L_L 


fine 

grid 1/2 1 »• • i-1/2 i M/2 M •*» m m+i/2 


cell; 


Figure 1 : Computational Grid 


test problems used had 64, 256, or 1024 cells, 16 angles; ath has the values 10^, 10^, 
10^, and 10^, under several different regimes for 7 = asIcTf These absorption regimes 
are 7 = 1 — l/(( 7 i/i)^, 7 = 1 — l({(rth)^, 7 = 0.99, and 7 = 1 (no absorption). The 
size of the test problems range from 4096 unknowns to 65 536 unknowns; < 7 ^ ranges 
from 640 to 1.024 x 10^. 

The convergence factors were estimated for randomly generated solutions. The 
convergence factor estimate was obtained by taking the geometric average of the ratios 
of the norms of the residuals obtained from the last 5 iterations for each method, 
except where roundoff error caused the residual norm to plateau. 

Note that in the tables an entry of the form 0.xxx{i:y) means O.xxx x 10^^. 

The convergence factor estimates are given in Table 1 (7 = 1 — l/((Tth)^), Table 2 
(7 = 1 — Table 3 (7 = 0.99), and Table 4 (7 = 1). The first regime is both 

of physical interest and also is the more difficult to solve using the standard MLD dis¬ 
cretization and the simple interpolation and restriction operators. This corresponds 


^ cells 

method 

ath 

10 ^ 

10 ^ 

10 ^ 

10 ^ 

64 

MG 

0.262 

0.736 

0.885 

0.931 

MG-hGMRES 

0.0487 

0.191 

0.236 

0.129 

256 

MG 

0.263 

0.741 

0.900 

0.952 

MG-fGMRES 

0.0477 

0.208 

0.550 

0.213 

1024 

MG 

0.263 

0.741 

0.905 

0.950 

MG+GMRES 

0.0454 

0.208 

0.695 

0.219 


Table 1: Convergence factors for 7 = 1 — 
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# cells 

method 

o-th 





64 

MG 

Hjign 


0.266 

0.046 

MG+'GMRES 

iwnil 


0.677(-2) 

0.176(-2) 

256 

MG 

B 

0.844 

0.722 


MG+GMRES 

mi 

0.165 

0.0559 

QQQjm 

1024 

MG 

0.488 

0.896 

0.895 


MG+GMRES 

0.122 

0.484 

0.255 



Table 2: Convergence factors for 7 = 1 — 


# cells method 

ath 

IQBI 

omifiiiiiii 



64 MG 

MG+GMRES 


0.0530 

0.215(-2) 

0 . 111 (- 2 ) 

0.186(-4) 

0.121(-4) 

0 . 221 (- 6 ) 

256 MG 

MG+GMRES 

0.263 

0.0477 

0.0530 

0.285(-2) 

0 . 111 (- 2 ) 

0.190(-4) 

0.121(-4) 

0.216(-6) 

1024 MG 

MG+GMRES 


0.0530 

0.279(-2) 

0 . 111 (- 2 ) 

0.189(-4) 

0.121(-4) 

0 . 222 (- 6 ) 


Table 3: Convergence factors for 7 = 0.99 


# cells method 

ath 





64 MG 

MG+GMRES 

0.320(-4) 

0.681(-5) 

0.206(-6) 

0.710(-7) 

0.119(-5) 

0.196(-6) 

0.116(-3) 

0.987(-5) 

256 MG 

MG+GMRES 

0.323(-4) 

0.105(-4) 

0.207(-6) 

0.910(-7) 

0.187(-4) 

0.354(-5) 

0.189(-2) 

0.135(-3) 

1024 MG 

MG+GMRES 

0.324(-4) 

0.160(-4) 

0.299(-5) 

0.649(-6) 

0.303(-3) 

0.230(-4) 

0.0321 

0.182(-2) 


Table 4: Convergence factors for 7 = 1 
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to a situation in which the scattered particles undergo a large number of scatterings; 
in addition they have a significant probability of being absorbed in a cell, and also 
of “escaping” a cell. The numerical difficulty of the problem is clearly evident in the 
convergence factors obtained. 

Results for diverse Krylov subspace methods using diagonal and ILU (Incomplete 
LU factorization) preconditioning are reported in [11], but they were only obtained 
for relatively small values of ath. These methods do not seem adequate for the very 
large values of ath that are studied here. For example, there it is reported that 
the convergence factor for 100 cells, 4 angles, and (Jth = 1, using GMRES with an 
ILU preconditioner, was 0.705 and clearly deteriorates as the number of cells and 
(7th increase. In contrast, with the multigrid method either used directly or as a 
preconditioner, the convergence factor for 256 cells, 16 angles, and ath — 10 was 
0.734 X 10“^ in the “no absorption” case. 

The worst regime for absorption is that with 7 = 1 — l/((Ti/i)^. In this regime, 
deterioration in the rates of convergence for both the direct multigrid and the GM- 
RES/multigrid methods is evident. Nevertheless, with GMRES, the convergence rates 
are significantly faster and would give overall rates of convergence at least twice as 
fast and up to nearly a factor of 30 faster. Since each step of GMRES only requires 
one matrix-vector multiplication for the operator and for the preconditioner and has 
negligible overhead, preconditioning would give improved overall speed. The multi¬ 
grid methods of [ 6 ] appear to give much better convergence factors, but at the cost of 
additional complexity of the algorithm, not to mention the additional effort needed 
to perform the analysis to design the correct operators for handling this case. 

Outside this regime, the GMRES/multigrid algorithm works consistently better 
than the direct multigrid algorithm, and where the original multigrid algorithm per¬ 
forms well, the GMRES/multigrid algorithm improves the convergence factor by a 
factor of as much as 100. However, in these cases it would only roughly halve the 
number of iterations needed to achieve a small error tolerance. As noted for the most 
difficult regime, where the original multigrid algorithm has difficulty, using it as a 
preconditioner for GMRES gives much better results. 

CONCLUSIONS 


In this paper the multigrid method for isotropic transport equations of [4, 5, 
9 ] for the “no absorption” case is applied to problems with absorption both as a 
pure iterative method and as a preconditioner for GMRES. In all cases, GMRES 
improves the convergence factor, although the value of this appears to be much greater 
for the cases in which the nonabsorption multigrid algorithm has difficulty (such as 
the absorption regime 7 = 1 — Ij^crthY). The multigrid algorithm thus has been 


646 



demonstrated as an efficient preconditioner for GMRES. Together they are robust 
and, in addition, work well for the absorption regime. We expect multigrid methods 
to work well for the other Krylov subspace methods which we have used, such as 
CGS, LSQR, and CGNE, for which preconditioning is essential. 

iu 
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FAST MULTIGRID TECHNIQUES IN TOTAL VARIATION-BASED IMAGE 

RECONSTRUCTION 


Mary Ellen Oman 
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SUMMARY 

Existing multigrid techniques are used to effect an efficient method for recon¬ 
structing an image from noisy, blurred data. Total Variation minimization yields a 
nonlinear integro-differential equation which, when discretized using cell-centered fi¬ 
nite differences, yields a full matrix equation. A fixed point iteration is applied with 
the intermediate matrix equations solved via a preconditioned conjugate gradient 
method which utilizes multi-level quadrature (due to Brandt and Lubrecht) to apply 
the integral operator and a multigrid scheme (due to Ewing and Shen) to invert the 
differential operator. With effective preconditioning, the method presented seems to 
require 0{n) operations. Numerical results are given for a two-dimensional example. 

INTRODUCTION 

The problem of reconstructing an image from noisy, blurred data can be repre¬ 
sented by the model equation 

z = Ku-\-e, (1) 

where AT is a smoothing operator, e is noise, and u is to be recovered from noisy data 
z. K is typically a Fredholm first kind integral operator, {Ku){x) = J k{x,y)u{y)dy, 
which is compact, so problems of this form are ill-posed; i.e., small perturbations in 
the data will produce wildly varying u’s. 

In the past, attempts to apply multigrid techniques to inverse problems similar to 
this have produced rather disappointing results. Either multigrid has been applied 
directly to (1) without stabilization (see [1] as an example) which produces poor 
quality reconstructions for high noise-to-signal ratios (due to the ill-posedness of the 
problem), or stabilization has been applied, but multigrid displays slow convergence 
(see [2]). In this paper it will be demonstrated how to overcome these difficulties with 
existing multigrid tools, obtaining a fast algorithm to approximate u in (1). 

To stabilize problem (1) Tikhonov regularization, or penalized least squares, is 
used: ^ 

mmTa{u), where Tq(u) = -IIAu — (2) 

^Research was supported in part by a DOE-EPSCoR graduate fellowship. 
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where o; is a positive parameter, and J is a known functional. 

A common choice for J is 

J{u) = / |V«P, (3) 

J O 

but this assumes u E Hence, it is unsuitable for image processing applications, 

where one wants to recover sharp edges, i.e., discontinuous u. 

In their seminal paper on Total Variation-based denoising [3], Osher, Rudin, and 
Fatemi considered the functional 

J{u) = f |Va|. (4) 

t/ St 

To overcome difficulties associated with nondifferentiability at Vu = 0, consider the 
modification 

Jpiu) = J^^\Vu\^ + /3dx, /3>0. (5) 

For P = 0, Jp is the total variation of u. Figure 1 (excerpted from [4]) depicts a 
comparison of reconstructions of u in (2). In subplot B, J is as in (3), hence the 
reconstruction is smooth; in subplot C, J = J/j as in (5), and a blocky image is 
recovered; and subplot D shows a filtered Fourier reconstruction of the data. Clearly 
Total Variation produces a superior reconstruction in this test case. 

Minimizing Tq, as given in (2) with J defined as in (5) yields the nonlinear integro- 
differential equation 

du 

K*{Ku — z) aVJp{u) = 0 for X e Q, and — = 0 for re G Sfi. ( 6 ) 

This can be written in operator form as 

Ku + aL{u)u = K*z, (7) 

where 

k = K*K (8) 

and L(u) is the diffusion operator whose action on a function v is given by 



Note that both K and L{u) are symmetric positive semidefinite operators. 

The following fixed point algorithm [4] can then be applied’ to handle the nonlin¬ 
earity: 

{k + = K*z, j/ = 0,1,... (10) 

At each iteration it is necessary to solve a non-sparse linear system. This paper 
presents multigrid techniques for solving these systems efficiently. 

The Denoising section deals with the case when K is the identity operator, the de¬ 
noising problem. The Deconvolution section returns to the original problem ( 1 ) where 
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3 


A) Exact and Noisy Data 


B) Sobolev H-1 Reconstruction 




C) TV Reconstruction D) Fourier Reconstruction 




Figure 1: Denoised reconstructions obtained using various filtering techniques. Dot¬ 
ted lines represent noisy data. Solid line in subplot A is exact solution. Solid lines in 
subplots B-D are reconstructions. 

K is a Fredholm first kind integral operator. Included are discussions of multi-level 
integration, preconditioning, and a recapitulation of the algorithm. The Numerical 
Results section discusses observed convergence rates for the numerical implementation 
and includes a two-dimensional example. 

DENOISING 


First, consider the case Ku = u. This corresponds to denoising an image, and 
(10) is reduced to 

{1 += z, z/ = 0,1,— (11) 

At each iteration it is necessary to solve a linear diflfusion equation whose diffusiv- 
ity depends on the previous iterate This iteration is referred to as a “lagged 
diffusivity fixed point iteration,” and is denoted here as FP (see [4] for details). 

Note that the diffusion coefficient l/-^|Vup -F /? is poorly behaved where Vu is 
large. The cell-centered finite difference discretization [5] is applied to overcome this 
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Figure 2: The spectrum of the discretized operator I + aL{u) for a fixed u in one 
space dimension. 

difficulty. After discretization, one must solve a sparse, block tridiagonal matrix 
equation to obtain at each fixed point iteration. Figure 2 shows the spectrum 

of the operator from (11) for a fixed u. A preconditioned conjugate gradient method 
has been employed with a multigrid preconditioner developed by Ewing and Shen [5]. 

DECONVOLUTION 

Now consider the case when AT is a Fredholm first kind integral operator. The 
matrix obtained from the discretization o^ K + aL{u^'''>) in (10) is no longer sparse. 
Hence, to use the lagged diffusivity fixed point iteration as before, a full matrix 
equation must be solved for each iteration. The conjugate gradient method can again 
be applied but with a cost of operations per iteration. In typical 2-D image 
processing applications Ri 10^^; clearly this operation count is unacceptable. The 
Multi-level integration section describes a scheme for reducing the complexity of one 
conjugate gradient iteration from to n. 
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Multi-level integration 


In [6], Brandt and Lubrecht describe a method based on multigrid techniques for 
approximately evaluating Kv which requires only 0{n) operations. The general idea 
is 

■kv ^ k\^ ~ ‘ ( 12 ) 

Here h and n indicate the mesh spacing and number of nodes on the fine grid, and 
similarly, H and N indicate the coarse grid with N « n. H^ and H^ are coarse-to- 
fine and fine-to-coarse intergrid transfer operators, respectively. 

To evaluate k^v’^ cheaply, restrict to the coarse grid, apply the coarse grid 
operator k^ at a cost of 0{N^) operations, and then interpolate k^v^ back to the 
fine grid. 

To see the details of this approximation, choose order transfer operators: H^, 
a coarse-to-fine mesh transfer (interpolation), and H^, a fine-to-coarse mesh transfer 
(restriction). Using order quadrature, the operation becomes 

[kv]{xf) = Jok{xf,y)v{y)dy, / = !,..., A/" 

= h kxf, a;J))uJ + 0{hP) 

= hj:’;^^[k{xf,x’<)U.%]jvf + 0{hF) + 0{H’) 

= AE?=i k(x«, x?)[(n‘)V]j + 0(/.») + 0(ff«) 


Then [kv]{xf) can be interpolated to the fine grid by H^ with 0{H‘^) accuracy. 
The entire application looks like 

k'^v^ = Il%k^(Il%fv'^ + 0{hP) + (14) 

li ^ n then provided q = 2p, and this calculation requires only 0{n) 

operations and maintains 0{hP) accuracy. To see this, let n = 2*®*^ {lev > 0 is the 
number of levels, or nested grids), let n -I- 1 be the number of points in the finest mesh 
with spacing h = ^, and let the coarsest mesh have A^-|-1 points with spacing H = 
where N = With second order quadrature (p = 2), K^Uj^v^ can be calculated 

in 0{N‘^) = C)((2'®’^/^)^) = 0{n) operations. Fourth order transfer operators {q = 4) 
ensure that the accuracy of U^k^Uj^v^ is 0{h‘^) + 0{H'^) — 0{h‘^). Note that 
= c(n^)'^, with c = H/h\ hence, is symmetric. 

This provides an 0{n) method of applying k which maintains O(h^) accuracy. 
Hence, an iteration of the conjugate gradient method applied to the system (10) will 
use only 0{n) operations. However, k -f aL{u) is not typically well-conditioned. 
The top right subplot of figure 4 depicts the eigenvalues of this operator for a fixed 
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■e- 

o 


gammaj;alpha=.0003;sigma-.1 ;cond(C(-1 )A)=1.489 



Figure 3; Eigenvalues of the preconditioned matrix, C where Lu = — 

C = 6/ + aL and b is the maximum eigenvalue of K. 

u, a, and j3. Note that these eigenvalues range over three orders of magnitude. 
Preconditioning must be used to improve convergence. 


Preconditioning 


To simplify notation, define 


A = k + aL(u) 


(15) 


For insight into the choice of a preconditioner, consider the 1-D case on 0 < re < 1 with 
L{u) replaced by the negative Laplacian and periodic boundary conditions, where K 
is a convolution operator, Ku = Jq k{x — y)u{y)dy, with Gaussian convolution kernel, 
k{x) = Then L has eigenvalues which tend to oo, K has eigenvalues 

which tend to 0, and L commutes with K. 

This eigenvalue structure suggests a preconditioner of the form C = bl + aL. 
Then the iteration matrix becomes = C~^A with eigenvalues 


7i = - 


+ aTT^i^ 


b + 




(16) 
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KK, sigma=0.1 


A=KK+alpha*L(u), alpha=0.01 




C=bl+alpha*L(u), b=0.6334 newA = sqCinv*A*sqCinv 




Figure 4: Eigenvalues of the discretized operator matrices K = K*K, A, C, and 
-vv^here ii" is a convolution operator with kernel k(x) = 

C = bl + aL{u), and L{u) is the nonlinear operator as in FP. 

The tend to 1 as j oo independent of b. To ensure 7j « 1 for small values of j, 
choose the largest eigenvalue of K for b, 

b = p{K) = (17) 

TT 

Figure 3 shows the eigenvalues of the iteration matrix C^^A, which result from this 
choice of b. Notice that cond{C~^A) fa 1. This implies that the conjugate gradient 
method will converge very rapidly. 

With the more general diffusion operator defined in (9), this choice of b is yet 
reasonable as shown in Figure 4. Here, the eigenvalues of the matrices A, C, and 
are shown. Although the eigenvalues are not all near one as in the 
constant diffusivity case, there is still clustering at one. The “stray” eigenvalues 
correspond to jump discontinuities in u. Thus, C = bl + aL{u) is an effective pre¬ 
conditioner for this case as well. 
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A fast reconstruction algorithm 


The fixed point iteration and preconditioned conjugate gradient techniques de¬ 
scribed above can be combined to form an efficient reconstruction algorithm. What 
follows is the outline of such an algorithm for the two-dimensional deconvolution 
problem. This algorithm is used to obtain the numerical results presented in the 
following section. 

* Apply fixed point iteration as in (10). 

* To solve {K = K*z, apply a preconditioned conjugate gradient 
method with preconditioner C = bl aL{u^’'^) with b = p{K). 

« Within the preconditioned conjugate gradient method, use multi-level integra¬ 
tion for each application of K. 

* Within each iteration of the preconditioned conjugate gradient method, solve 
equations of form Cv — {bl -\-aL)v = / by a preconditioned conjugate gradient 
method with the Ewing-Shen multigrid preconditioner [5]. 

Notice that C = bl + aL essentially the same as the operator in a fixed point 
iteration of the denoising problem (11). The multi-level integration is 0{n) as shown 
above and in [6]. Therefore, the complexity of the preconditioned conjugate gradient 
method to solve {K 4- = K*z depends on the complexity of solving 

Cv = f. 


NUMERICAL RESULTS 


In Figures 5 and 6, the operator K has been taken to be a convolution integral 
operator with kernel, 

k{x) = (18) 

as in the Multi-level Integration section. Figure 5 presents convergence results for 
this 2-D example with noise-to-signal ratio = 1 and kernel-width parameter, a = 
0.075. Subplot A depicts the norms of the differences between successive iterates. 
Subplot B shows the norm of the gradient of Tq, as in (6). Subplot C plots the 
preconditioned conjugate gradient convergence factor for each fixed point iteration 
where the geometric mean convergence factor is calculated by 


convergence factor = exp 


1 ^ „p„m+l 

M ^ res'^ 


(19) 
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2468 10 2468 10 


Fixed point iteration Fixed point iteration 



2468 10 12345 

Fixed point iteration PCG iteration 


Figure 5: Subplots A and B show the norms of the differences between iterates and the 
gradient of the function Tq, respectively. Subplot C contains the convergence history 
of the preconditioned conjugate gradient method with preconditioner C = bl + aL 
at each fixed point iteration. Subplot D plots the residuals of the preconditioned 
conjugate gradient method for 5 iterations at the tenth fixed point iteration. 

where res^ is the residual calculated at the (m — 1)®* preconditioned conjugate gra¬ 
dient iterate. Subplot D records the norms of the residuals at each preconditioned 
conjugate gradient iteration for the tenth fixed point iteration. Figure 6 shows the 
noisy data (with noise-to-signal ratio = 1), z = Auexact + e and the subsequent re¬ 
construction obtained by the above algorithm. 

These results show that the described algorithm can be used to obtain recon¬ 
structions even for very noisy data. The convergence of the preconditioned conjugate 
gradient method is quite fast as evidenced by Figure 5, Subplots C and D. It is known 
that the multi-level integration method has 0(n) complexity. Hence, the complexity 
of the preconditioned conjugate gradient method to solve {K= K*z 
depends on the complexity of solving Cv = /, where C = hi This system 
is nearly identical to the one obtained in the discretization of the denoising problem, 
and for the results given here the same solver has been used, i.e., a preconditioned con- 
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A) Exact solution B) Kernel 





Figure 6: Subplot A shows the exact solution. Subplot B shows the kernel of the 
convolution operator. Subplots C and D show the data with added noise (noise-to- 
signal ratio = 1). Subplots E and F show the subsequent reconstruction with the 
algorithm described. 

jugate gradient with a cell-centered finite difference multigrid preconditioning step. 
This method appears to be nearly 0{n) in complexity. 
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A MULTILEVEL ALGORITHM FOR THE SOLUTION OF 
SECOND ORDER ELLIPTIC DIFFERENTIAL 
EQUATIONS ON SPARSE GRIDS 


Christoph Pflaum 

Institut fur Informatik, Technische Universitat Miinchen 
D-80290 Miinchen, Germany 


SUMMARY 


A multilevel algorithm is presented that solves general second order elliptic partial 
differential equations on adaptive sparse grids. The multilevel algorithm consists of 
several V-cycles. Suitable discretizations provide that the discrete equation system 
can be solved in an efficient way. Numerical experiments show a convergence rate of 
order 0(1) for the multilevel algorithm. 


1 Introduction 

In 1990, Bungartz and Zenger used hierarchical bilinear finite elements on a sparse 
grid to discretize Poisson’s equation on the unit square (see [1] and [2]). The discrete 
equation system was solved by a recursive algorithm. Balder extended this idea for 
the solution of the Helmholtz equation in the d-dimensional space (see [3]). 

In this paper, a multilevel algorithm is presented, that solves general second order 
elliptic partial differential equations on adaptive sparse grids. This multilevel algo¬ 
rithm consists of several V-cycles in one direction and of a Gauss-Seidel relaxation on 
each level. The restrictions of these V-cycles are a semicoarsening. Thus, the multi¬ 
level algorithm is similar to the multilevel algorithm in [4] and [5]. The Gauss-Seidel 
relaxation and the restriction and prolongation is made like the multilevel projection 
method in [6]. The multilevel cycle of the sparse grid multilevel algorithm is called 
Q-cycle. The problem of this Q-cycle is the calculation of the right hand side during 
the restriction. In case of general second order elliptic differential the exact stiffness 
matrix is so complicated that it is not possible to calculate the right hand side in 
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an efficient way. This means that one multilevel cycle costs more than 0{N‘^) opera¬ 
tions, while 0{N log N) is the number of sparse grid points. Thus, it is necessary to 
approximate the bilinear form a corresponding to the elliptic equation. 

We studied two approximations of the bilinear form a. First, the variable coef¬ 
ficients in the bilinear form were replaced by a piecewise constant sparse grid inter- 
polant. Then, it is possible to calculate the right hand side in an efficient way. But 
even an additional simplification of the bilinear form o is possible. For Laplace’s equa¬ 
tion some hierarchical basis functions are orthogonal with respect to the bilinear form 
corresponding to Laplace’s equation. Therefore, it makes sense to replace the bilinear 
form a by a simplified bilinear form ah, which has similar orthogonality properties 
even in case of general elliptic differential equations. This gives the discretization 
with semi-orthogonality (see section 3). A convergence with order 0{N~^ log N) 
could be proved for this discretization of the Helmholtz equation (see [7]). Numerical 
experiments show the same behavior of convergence as for the original bilinear form 
even in case of more complicated elliptic differential equations. The advantage of the 
semi-orthogonality is that Q-cycle of the sparse grid multilevel algorithm becomes as 
simple as the V-cycle on full grids with bilinear finite elements. The reason for this is 
that nearly the same equations can be used for both multilevel cycles. On every level 
relaxations are made with a nine-point stencil. The restriction and the prolongation 
from one level to another one are made in the same way as in the case of full grids. 
For this it is only necessary to ignore the sparse grid points which are not contained in 
the actual level. This is allowed by the semi-orthogonality. All numerical experiments 
show a convergence rate with order 0(1) for the sparse grid multilevel algorithm. The 
multilevel algorithm requires only 0{Nlog N) operations per cycle. 

For simplicity, the discretizations and the algorithms in this paper are explained 
only for the regular sparse grids Vn- However, it is possible to generalize the algo¬ 
rithms for adaptive sparse grids. The Q-cycle has been implemented for adaptive 
sparse grids and solves general second order elliptic differential equations. 

Throughout the paper, it is = 2“”, where n G IN and =]0, Ip. 

2 Sparse Grids and Sparse Grid Interpolation 

The set of one dimensional grid points is 

V = 1^3 di • ^\n e ¥lo and do = 1, di = -1, ^2, • ■ •, G {1, -1}| U {0}. 

These points are illustrated in Figure 1. For every x e "PVIO}, there exist unique 
n e INo and do = 1, di = -1, d 2 ,...,dn e {1,-1} such that x = T,i=Qdi • 
Therefore, we can define the depth of a point x eV hy 

T(0) = 0 T{x) = n for x G V\{0}. 
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The regular sparse grids I>„ and !>„ are defined 
by 

Vn := {{x,y) eV xV\T{x)+T{y) <n + l}, 

Vn := T>„n]0,lp, , 

where n G IN. A more detailed description of 
general abstract and adaptive sparse grids and 
their properties is given in [8] and [9]. 



0.125 0.375 0.625 0.875 


Figure 1. Tree of possible grid points. 


Now, we will define the sparse grid interpolation with piecewfte bilinear functions. 
For every x GV and A: G IN , we define the piecewise linear function 



( 1 ) 


: [0,1] R 


and for every {x,y) E V x V and A:, Z G IN the piecewise bilinear function 

The hierarchical basis functions of the point {x,y) G V x V is the function 

^ ._ T{x),Tiy) 

V{x,y) •— ^{x,y) 

There are two regular finite element spaces for the regular sparse grid Vn 

Vvn := span]R{u^| 2 : G X>n} C n and 

Vvn := spanafu^lz G C W^i^) n"C(f2). 

There is a unique sparse grid interpolation operator Xx>„ : C{Q) i->- Vv^ such that 

= f{z) G Vn (see [2]). 

The sparse grid interpolation error with piecewise bilinear functions is now: 

Theorem 1 (Sparse Grid Interpolation Error). There exists a constant C > 0 such 
that the error in the TT 2 -norm is for h = 2“” 


\\f-^v.{f)\\wi < C\\f\\„a..h 

< cm„a,Mogh~' 

where 

H°-\a) := {/ € L 2 (fi) Ill/llffOJ < 00 } and 


for / 6 

for f e Wf'=(0), 


and 


HO,I 


{ 

Qi+3f 

\ 

\ 

dx^dy^ 

L 2 ' i+j<l, i,j<^+l 


h 
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The proof of Theorem 1 is given in [ 1 ] and [7]. At the end of this section, we 
define the following full grids and a full grid finite element space 

^k,i ■= {(a:, y) eV X V\T{x) < k and T{y) < 1} and Q.k,i := f2fc,in]0, Ip 

O , . O 

and V'''’* := span 5 ^{n 2 | 2 : G 

3 Discretization of Elliptic Equations 

We use the same notation as in [10]. Let / e {L 2 {^))' and 
a: X lT 2 ^(fl) R 

(u, v) !-)■ 

|a|,|/?l<l 

where a, ^ are multiindices and A = (aQ,^)|^l G (C(r2)) . Let us assume that 

O “ O 

a is continuous and Wj (f2)-elliptic. We are looking for a solution u G Wj (fi) of the 
equation 

a{u,v) = f{v) for all u G W 2 (fl). (1) 

O O 

The problem is now that we cannot replace by the finite element space Vd„ 

and use the same bilinear form a. If we did so, we would get a manifold of stiffness 
matrices of dimension more than 0(2”n) for this class of elliptic equations. Then, 
we would not be able to store the stiffness matrix in a sparse grid data structure. 
Therefore, we replace the bilinear form a by an approximate bilinear form. First, we 
replace a by 

ah : X 1^2^^^) ^ ^ 

{u,v) ^ [ Y1 d{x,y), 

^ |a|,|/3|<l 

where X%^{aa,p) is a suitable sparse grid interpolant. Second, we replace ah by a 
bilinear form dh with a semi-orthogonality property. For the definition of the semi¬ 
orthogonality property, we need the set of pairs of semi-orthogonal grid points (see 
Figure 2) 

o»:=o!.uo;. 

where 

Oi := {{{x,y),{x',y'))eAxVr,\T{x)<T{x') and T{y)>T{y') and 
supp(u(a;,y)) n supp(?;(j;/y)) fi = 0 and 
supp(u(a;,j/)) n SUpp(U(,,/y)) 7^ 0}, 

'■= {iz,z') e VnX Vn\{z',z) e Ol} and 
supp(u) := {z G Q\v{z) 7 ^ 0} for v G C(0). 
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Observe that here the support supp of a function is not compact in general. 
Now we define the semi-orthogonality property of a bilinear form. 
Definition 1 (Semi-Orthogonality Property). 

O O 

A bilinear form b : Vv^ x >->• R the semi-orthogonality property, if 

K'^z, Vz') = 0 for every (z, z') e Oh- 


{x,y) 



o 

/ 


no grid points 

o 

(x 


Figure 2. Supports of Hierarchical Basis Functions of Semi-Orthogonal Grid Points. 


A simple calculation shows that the bilinear form {w, v) f^fS/w, Vv)d{x, y) has 
the semi-orthogonality property. In case of general second order elliptic differential 
equations, we define the discrete bilinear form hh by its values on the hierarchical 
basis 


o>h : Vvr, X Vvn ^ R 

- / >> ._ / 0 for ^ 

ah[Vz,Vz') ■ y ah{vz,v„,) for {z,z')^Oh- 

Obviously, ah has the semi-orthogonality property. The discretization of equation (1) 
with semi-orthogonality is now: 


O 

Discretization with Semi-Orthogonality Find a UhE Vvn such that 


o 

dh{uh,Vh) = f{vh) for all Vh G Vv,,- 


4 Multilevel Algorithm 

Let dh be the bilinear form Oh or dh- We want to solve the following problem: 
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Discrete Equation System Find Uh e such that 

o 

ah{uh,Vh) = /K) Vuft G Vv^. (1) 

O 

Assume that u G is an approximate solution of the discrete equation system. 
Obviously, there are Az G IR such that u = Yj ° )^zVz- For fixed A:, / G IN, A: + i < 
n + 1, we make the following decomposition: 

u = 

where ^ 

X^Vz G and vl;it = Y 

® o 

z = (x^y) e 'Ui A 
iT{x) > fe V T{y) > 1) 

For the construction of a multilevel algorithm, we have to push to the right hand 
side. Thus, we define 

f’^ivh) ■= f{vh) - ah{Urest,Vh) for Vh E V^'K (2) 

Now, we can define the 

O 

Equation System of Level (A:, 1) Find g such that 

hk{u’^’^,VH) = f\vh) Vu.gE'^’' (3) 

Naturally, if fi = u is the exact solution of the discrete equa'^’on system, then 
is the solution of the equation system of level {k, 1). If is the solution of the 
equation system of level {k,l) for every A: + Z < n + 1, then u = u is the exact solution 
of the discrete equation system. 

For relaxations, it is helpful to form the equation system of level {k, l)in a matrix 
equation. Therefore, we define the vectors » E and the 

matrix o e by 

V Jz,z'eQ.k,i 



= E 

o 

Z G , 1 

(4) 


:= and 

(5) 

Ak^l 

"^ZyZ^ 

:= aJv^/,v^A. 

(6) 


The following matrix equation is equivalent to the equation system of level (A:, 1): 
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Matrix Equation of^Level {k, 1) 

Find (U^A o E such that 

V ^ yzenk,i 

j^,ljjk,l _ pk,l 


( 7 ) 


Now, we want to construct a multilevel algorithm. The principal data to be stored 
are: 


« k, 1: depth of the actual level. 2 ^ and 2 ^ are the mesh sizes of the full grid 

O 

corresponding to the actual level, 

• (^ 2 ) c’n • the actual approximate solution, 

lyfi 

• (^z)z£v ■ tti® right hand side of the actual level, and 

• '■ the one dimensional hierarchical surplus in the direction of the last 
restriction. 


First, we have to define a relaxation step in the level {k,l). Let 

_ -fe,/ I 

'^old ^old ^old^rest 

be the decomposition of the actual approximate solution. Assume [4 = tor 

O 

all ^ G Q.k,i. 


Procedure: Relaxation 

Choose , for the start solution of (7). Make a standard relaxation 

step (e.g. Gauss-Seidel-relaxation) of equation (7). This gives the new 
approximate solution on the level {k,l). 


After one relaxation step, we define Unewi^) •— for all 2 : G hk,i- ^new ^ 
is the new approximate solution on the level {k, 1). The new approximate solution is 
now 


'O'new • ^new ^ 


~k,l 

oldyVesV 


o 

But after one relaxation, we only have Unew{z) = Uz for all z G Q.k,i- For the 
propagation of Unew to other grid points, we need the procedures restriction and 
prolongation. The procedure prolongation calculates Unew on the new level. 


Procedure: Restriction in x-direction 

O 

For {x,y) G Qk,i with T{x) = k do 

^^{x,y) • C(^x,'y) 0-5 * {U{x-\-2~^,y) T U{x—2~^,y))i 
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Procedure; Prolongation in x-direction 

O 

For {x,y) G Q,k^i with T{x) = k do 
U{x,y) ■= 11 {x,y) + 0.5 * 

The procedures Restriction in y-direction and Prolongation in y-direction are de¬ 
fined analogous. 

The procedures Restriction and Prolongation calculate 

O 

Uz '■= Unewi^) for .2 G ^knewjnew^ 

where {knewJnew) is the new level. The procedures Restriction and Prolongation can 
do this only if the multilevel algorithm satisfies the following rule: 

Restriction-Prolongation-Rule 

Assume that Restriction in x-direction was used from the level (A:', I') to 
the level (A;' — 1, 1'). Then use Prolongation in x-direction with k = k' — 1 
next time only if I = I'. 

Assume that Restriction in y-direction was used from the level (A:', V) to 
the level (A:',/' — 1). Then use Prolongation in y-direction with Z = Z' — 1 
next time only if A: = A:'. 

Last, we need the procedure 

Procedure; Calculation of the right hand side 

This procedure calculates := F^'^ for all grid points z G 


AND 


Figure 3: Q-Cycle of the multilevel algorithm on a sparse grid 
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Now we can explain the Q-cycle (see Figure 3): 


THE Q-CYCLE { 

Step 1; Way in x-direction 
LET fc := 1; 

WHILE k<n{ ■ 

Step 1.1: V-cycle in one direction 
LET I :=n- k+l\ 

WHILE Z > 1 { 

Restriction in y-direction-, AND Z ;= Z — 1; 
Calculate the right hand side-, 

} 

Relaxation-, 

WHILE I <n-k + l{ 

Z := Z + 1 AND Prolongation in y-direction; 
Calculate the right hand side-, 

Relaxation; 

} 

Step 1.2: Changing k 

Restriction in y-direction; AND Z := n — k; 
Calculate the right hand side; 
k := k-hi; AND Prolongation in x-direction; 
Calculate the right hand side; 

Relaxation; 

} 

Step 2: Way in y-direction 
analogously 


Observe that this cycle satisfies the Restriction-Prolongation-Rule. 


The Q-cycle can be implemented in an efficient way. This means that the number 
of operations of one Q-cycle is proportional to the number of grid points. Observe 
that it is enough to find an implementation such that the number of operations of 
every procedure on the actual level is proportional to the number of grid points of the 
actual level. Except for the procedure Calculation of the right hand side, it is simple 
to see how to do this. 

In case of the discretization with semi-orthogonality, the Calculation of the right 
hand side is similar to the full grid case. 

O 

Let us assume that at the beginning of the Q-cycle it is for (x, y) € 


jp _ jpT{x),n—T{x)+l 

— ^{x,y) 


( 8 ) 
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Now, we do the Calculation of the right hand side in the multilevel cycle in the 
following way. After a restriction in x-direction we use the equation 


^{x,y) ^{x,y) + 2 ^{x,y+2-^)) U , j 

in the Calculation of the right hand side. After a prolongation in x-direction we use 
the equation 

pk,l _ pk,l-l _ \ (pk,l I pk,l \ I pj, (p-.k,! _ f,k,l-l k,l-l\ 

Similar equations must be used after the restriction and prolongation in y-direction. 
At the end of one Q-cycle equation (8) is correct again. 


5 Numerical Results 


Numerical Example 1 (Spectral Radius of the Q-cycle) 
Let e > 0. Then, the bilinear form 


a ; X ^ 2 ^^) ^ 



Vn d{x, y) 


is We are interested in the spectral radius of the Q-cycle iteration 

matrix on the regular sparse grid Vn- Table 1 shows the approximate spectral radius. 
It is very small independent of n and e. 


e 

0.001 

0.01 

0.1 

1 

10 

100 

1000 

n = 3 

0.1 

0.03 

0.02 

0.01 

0.005 

0.02 

0.1 

n — 4 

0.08 

0.002 

0.01 

0.002 

0.002 

0.01 

0.06 

n = 5 

0.01 

0.02 

0.005 

0.002 

0.005 

0.01 

0.01 

n = 6 

0.01 

0.01 

0.005 

0.002 

0.01 

0.005 

0.01 

n = 7 

0.01 

0.01 

0.003 

0.01 

0.01 

0.01 

0.01 

n = 8 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 


Table 1: Approximate spectral radius of the Q-cycle 


Numerical Example 2 (Convergence of the discretization with semi-orthogonality) 
Let us look to the domain 

T = |( 2 ;, y) g] 0, lp|0 < a; < 1 and 0.5 • (1-H sin(7r • a;)) > ?/> rr • 0.25 j . 
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n 

3 

4 

5 

6 

7 

8 

9 


1.5e-3 

5.6e-4 

1.9e-4 

5.8e-5 

1.8e-5 

5.3e-6 

1 . 6 e -6 


2.0 

2.8 

3.0 

3.2 

3.3 

3.4 

3.4 


“ 00,Vn 

5.4e-3 

1.9e-3 

6.0e-4 

1.9e-4 

5.9e-5 

1.8e-5 

5.2e-6 

|| —||oo,X>„_i 

ll~ll°°.Pn_ 

2.0 

2.9 

3.1 

3.2 

3.2 

3.3 

3.4 


Table 2: Convergence of the discretization with semi-orthogonality and 77 = 1 


The function 

u{x,y) = {l.O—exp{x/rj)) • {1.0—exp{y/r))) 

is the solution of the equation u G 1 ^ 2 ^^) 
and 

a{u, v) = 0 for all v G hFj (^) ( 1 ) 

with Dirichlet boundary conditions. Let us 
map the domain ^ by a smooth mapping 
onto the unit square. 

This gives a transformed elliptic equation of equation (1) on the unit square. Now, 
we can solve this equation by the discretization with semi-orthogonality. Thus, we 
get a discrete solution Uh of the equation ( 1 ). Figure 4 shows an adaptive sparse grid 
with 1220 grid points. There are more points on the left side of the domain, because 
u is not very smooth for small x. 



Figure 4. Adaptive sparse grid on the domain 
$ for T] = 0.1 


jeOjL 


|w{2)P 


We use the following discrete norms |i'a’||oo,'D„ := max^gj?^ |'u^(' 2 ;)| and ||n;|| 2 ,x>„ := 
-. Table 2 leads to the conjecture that Uh converges to u with the order 

\w\\oo,Vn = 0(/iMogh“^) and 111011 2 ,= 0(hMog/i”^). 


\'Dn\ 
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ERROR AND COMPLEXITY ANALYSIS FOR A 
COLLOCATION-GRID-PROJECTION PLUS PRECORRECTED-FFT 
ALGORITHM FOR SOLVING POTENTIAL INTEGRAL EQUATIONS WITH 
LAPLACE OR HELMHOLTZ KERNELS 

J. R. Phillips* 

Dept, of Electrical Engineering and Computer Science 
Massachusetts Institute of Technology 
Cambridge, MA 02139. 

SUMMARY 

In this paper we derive error bounds for a collocation-grid-projection scheme tuned 
for use in multilevel methods for solving boundary-element discretizations of potential 
integral equations. The grid-projection scheme is then combined with a precorrected- 
FFT style multilevel method for solving potential integral equations with ^ and 
kernels. A complexity analysis of this combined method is given to show that for 
homogeneous problems, the method is order nlogn nearly independent of the kernel. 
In addition, it is shown analytically and experimentally that for an inhomogeneity 
generated by a very finely discretized surface, the combined method slows to order 
^4/3 examples are given to show that the collocation-based grid-projection 

plus precorrected-FFT scheme is competitive with fast-multipole algorithms when 
considering realistic problems and 1/r kernels, but can be used over a range of spatial 
frequencies with only a small performance penalty. 

1. INTRODUCTION 


In the last several years, there has been a significant increase in the volume of 
research on discretized integral equation, or boundary-element, solvers[l]. Boundary- 
element methods have always been an appealing approach for solving exterior 
problems, because such methods only discretize domain boundaries and not exterior 
volumes. The main difficulty with boundary-element methods is that they generate 
dense matrices which were expensive to solve. What has generated renewed interest 
in boundary-element methods is that the combination of iterative solvers, such as 
Krylov-subspace methods, and matrix sparsification techniques, like fast-multipole 
and multilevel methods, have been used to create very fast boundary-element 
codes [2, 3, 4]. 

Fast-multipole based codes for solving potential problems with ^ kernels are 
now commonly used in a variety of engineering applications [5]. What is now of 
primary research interest is developing sparsification procedures for boundary-element 
matrices which are capable of solving potential problems with relatively general 
kernels, at least including ^ and ^ for a wide range of kr [6, 7, 8, 9, 3, 10]. Such a 
direction parallels the recent work on using multigrid methods to solve the Helmholtz 
equation [11, 12]. 

*This work was supported by ARPA contracts N00174-93-C-0035 and J-FBI-92-196 as well 
as grants from the Consortium for Superconducting Electronics, the Semiconductor Research 
Corporation (SJ-558), IBM and Digital Equipment Corporation, and an NDSEG fellowship. 
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In this paper we analyze errors and complexity for a general collocation-grid- 
projection scheme for use in a precorrected-FFT style algorithm for solving integral 
equations with general kernels. In the next section, we briefly review the boundary- 
element method for solving potential integral equations and give a brief description 
of the precorrected-FFT approach. In Section 3, which contains the main theoretical 
results of this paper, we give rigorous error bounds for a collocation-based grid- 
projection scheme. In Section 4, we address the issues of algorithm computational 
complexity, and analyze the homogeneous case as well as one type of inhomogeneity. 
In Section 5, we give some experimental results to show that the collocation-based 
grid-projection plus precorrected-FFT scheme is competitive with fast-multipole 
algorithms when considering realistic problems and 1/r kernels, but can be used 
over a range of spatial frequencies with only a small performance penalty. 

2. PROBLEM FORMULATION AND THE PRECORRECTED-FFT ALGORITHM 


Laplace or Helmholtz problems, with a combination of Neumann or Dirichlet 
boundary conditions, can be cast into an integral equation form using monopole, 
dipole or combined-layer potentials [13]. In the combined-layer case, the potential is 
represented by 


ipix) = / {Gn{x,x') — iriG{x, x')}a{x')da', x €. S (1) 

JS 

where re, x' G 9?^, 5 is a multiply-connected two dimensional surface in 9?^, G{x, x') = 
gifellx-x'ilyqT^jj^. — x'W is the Green’s function for the Laplace {k — 0) or Helmholtz 
equation, is the surface normal derivative of G at x', cr{x') is the combined-layer 
density often referred to as a charge density, and 7? is a complex scalar which depends 
on k. 

For each point rc for which u{x) is specifled, the charge density satifies 

—^ + [ Gn{x,x')a{x')da' — ir) [ G{x,x')a{x')da' = u{x) (2) 

2 Js Js 

and for each point x where Un{x) is specifled the charge density satisfies 

Gn' {x, x')a{x')da' -F jg = Un{x). (3) 


Boundary-Element Discretization 


Boundary-element methods are commonly used to solve potential integral equa¬ 
tions like (2) and (3), but are easiest to describe when considering the simple first-kind 
integral equation of the form 


'il^ix) = [ a{x')G{x,x')da', 
Js 


xeS. 


(4) 
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To compute an approximation to a, the boundary-element approach is to consider an 
expansion of the form 

n 

cr{x) ^Ylqihi(x), (5) 

where hn(x) : 3?^ —> SR are a set of compactly supported expansion functions, 

and qi,...,qn are the unknown expansion coefficients. The expansion coefficients are 
then determined by requiring that they satisfy a Galerkin condition of the form 


Pq = p, 


( 6 ) 


where P G SR"^"' is given by 

Pij = j hi{x) J hj{x')G{x,x')da'da. 


( 7 ) 


The approach used in many engineering applications is to approximate the surface S 
with N planar quadrilateral and/or triangular panels, in which case the support for 
hi is just a single panel. 


The precorrected-FFT technique 


If Gaussian elimination is used to solve (6), O(n^) operations and 0(n^) storage 
are required. Typical engineering problems may have thousands or tens of thousands 
of panels, so that Gaussian elimination is not a feasible approach. In [14, 15] it was 
shown that the precorrected-FFT method described below is an efficient approach 
to solving (6), reducing the number of operations and memory required to nearly 
0(nlogn). As can be seen from Fig. 1, for solution of Laplace’s in typical engineering 
geometries, the precorrected-FFT method is superior to fast multipole algorithms in 
terms of computation time and memory requirements. 

Consider solving (6) by using a Krylov-subspace technique such as GMRES [16]. 
The dominant costs of such an algorithm are in calculating the entries of P using 
(7) before the iterations begin, and performing v? operations to compute the dense 
matrix-vector product on each iteration. To develop a faster approach to computing 
the matrix-vector product, after discretizing the problem into n panels, consider 
subdividing the problem domain into an array of small cubes so that each small 
cube contains only a few panels. Several sparsification techniques for P are based on 
the idea of directly computing only those portions of Pq associated with interactions 
between panels in neighboring cubes. The rest of Pq is then somehow approximated 
to accelerate the computation [2]. 

One approach to computing distant interactions is to exploit the fact that 
evaluation points distant from a cube can be accurately computed by representing 
the given cube’s charge distribution using a small number of weighted point charges 
[17]. Pq can then be approximated in four steps; (1) project the panel charges onto 
a uniform grid of point charges, (2) compute the grid potentials due to grid charges, 
(3) interpolate the grid potentials onto the panels, and (4) directly compute nearby 
interactions. This process is summarized in Figure 2. 
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Example 

Speed 

Memory 

micromotor 

0.68 

0.81 

cube 

0.73 

0.31 

woven bus 

0.63 

0.42 

bus crossing 

0.43 

0.26 

via 

1.42 

0.37 

DRAM cell 

0.80 

0.73 


Figure 1: Comparison of performance 
of FFT-based to multipole-based codes 
for 1/r Green function. “Speed” is 
ratio of matrix-vector product time of 
precorrected-FFT method to fast multi¬ 
pole based method, “memory” the ratio 
of required storage. 


u-1 

1-1 


1— 1 » 







\ 

x: 

<)-1 

»—1 

,—, 

1— 


Figure 2: 2-D Pictorial representation of 
the four steps of the precorrected-FFT al¬ 
gorithm. Interactions with nearby pan¬ 
els (in the grey area) are computed di¬ 
rectly, interactions between distant panels 
are computed using the grid. 


There are several possible approaches to computing the grid charge. Analysis 
of one possible scheme is presented in Section 3. When the grid charges have been 
determined, their potentials at the grid points must be computed. The potential '4>{x) 
at a point x = (a;, y, z) is the sum of the potentials from all the grid charges q{x'), 

i^{x) = Y,g{x,x')q{x') . ( 8 ) 

x' 

The free-space Green function g{x, x') = g{x — x',y — y', z — z') depends only on the 
relative difference between the points x and x'. Therefore, because of the regular grid, 
the computation of the grid-charge potentials at the grid points is a three-dimensional 
discrete convolution. This convolution can be rapidly computed by using the Fast 
Fourier Transform[18], requiring 0{NlogN) operations. Once the grid potentials 
have been computed, they must be interpolated to the panels. 

In the computation of panel potentials due to grid charges, the portions of Pq 
associated with neighboring cube interactions have already been computed, though 
this close interaction has been poorly approximated in the projection/interpolation. 
Before computing a better approximation, it is necessary to remove the contribution 
of the inaccurate approximation. It is possible to construct a “precorrected” direct 
interaction operator, which consists of the direct interaction operator Pa,b for 
neighboring cells a and b, with the errors introduced by the grid-charges exactly 
subtracted out. When used in conjunction with the grid charge representation, 
P^J results in exact calculation of the interactions between panels which are close. 
Assuming that the Pq product will be computed many times in the inner loop of an 
iterative algorithm, wiU be expensive to initially compute, but wiU cost no more 
to subsequently apply than Pafi- 

3. GRID-PROJEGTION SGHEME 

In this section, we describe and analyze accurate operators for projecting charge 
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densities onto the grid and for interpolating potentials from the grid, the two problems 
being equivalent as noted in [3]. 


The Collocation Grid-projection and Interpolation Operators 


Consider approximating the potential of a charge distribution p{x) by a set of 
Nq point charges, Qj,j = 1... No which are positioned at points Xj. Suppose also 
that both the point charges and the charge distribution lie entirely inside a sphere of 
radius a centered at the origin. We will require that the potential of the point charges 
and the potential of the true charge density match at a set of iV < No collocation 
points Xc,k, A: = 1... iV on a closed surface which encompasses the sphere of radius a. 
That is, for each k, 

'^QjG{xj,Xc,k) = J p{x')G{x\xc,k)dx' 

where G{x, x') is the relevant Green’s function. It will be convenient if the surface 
is chosen to be a sphere of radius Tc > a, and the collocation points are chosen to 
be the abscissas of a quadrature rule on the sphere. Integration rules of arbitrary 
order on a sphere can be constructed by product techniques, but more efficient non¬ 
product rules exist [19] which will generally be sufficient for our purposes. By careful 
selection of the quadrature rule, at least for the orders we have checked, it is possible 
to insure the grid charge does not substantially exceed the net cube charge. That is, 
for appropriately selected quadrature rules, 

Y,\Qj\< K ( \p{x')\dx' (9) 

j 

where k is a constant independent of order. 

In addition to constructing operators that represent panel charges by grid charges, 
it is necessary to construct operators, of comparable accuracy, that interpolate 
potentials at the grid points to the charge panels. 

Lemma 1. If W is an operator which projects charge onto a grid, W'^ is an 
operator which interpolates potential at grid points onto charge coordinates, and W 
and W'^ have comparable accuracy. 

Proof. Suppose that a unit charge at the point Xq is represented by the vector of 
grid charges Qg. The approximate potential #(y) at a point y is given by 

i 

where Xi is the position of the zth charge, and g{xi, y) the Green function. Conversely, 
suppose there is a unit charge at y, and the potential at xq is to be computed. Then, 
if V is the interpolation operator, 

^(^o) = Y,'^(^o,Xi)g{xi,y) = Vg 
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For a symmetric Green function, '^{xq) = g{xo,y) = g{y,Xo) = ^(y), so that 

^{xo) - ^(xo) = Vg- ^(xo) = {g^v^Y - ^(y) = '^{v) - ^(y) = - ^(y) 

if we require V = gj. In other words, if IF' is an operator which represents a charge 
at point xq by grid charges, interpolates potential at the grid points onto the 
point rco, and W and W'^ have the same order of accuracy. □ 


Error analysis 


First we establish error bounds for the approximation of a panel charge potential by 
grid charges. 

Lemma 2. Suppose a grid-charge representation of a charge distribution p{x), 
of total charge Q — f \p{x')\dx', lying inside a sphere S{a) of radius a centered at 
the origin, has been constructed. Assume the grid charges Qj are given at points 
Xj, j = l...Na, and define Qg = The error 0e in the grid-charge 

approximation of the potential in the k = 0 case satisfies 


\^e\ < 


Q + Qg , a \M+i + 1)^ + 1 




1 ifl/rYj 


( 10 ) 


where M is the order of the quadrature rule and rm is the distance of the nearest 
potential-evaluation evaluation point to the origin, > a. 

Proof. The multipole expansion of potential (j) of the charge distribution is [20] 

1 


00 i 1 

(^(r, e,4>) = A'KY^ Y. 


1=0 m——l 


21-\-1 r'+i Js{a) 


[/ r'‘p(x’)Y,l,{e',4,')dx']Y,„{e,<p). (11) 

Js(a) 


Similarly, the multipole expansion of the grid-charge potential (t>g{r, 9, (j)) is 

00 l 1 ^ Ng 

+ ' 


°° ^ 1 1 

Mr ,«,« = 4,r j; 2 : TT— Qir'jYMOi, 0j)|51m(», <t>)- (12) 

1=0 m=-l i.r 


Let (rc, 0fc, denote the A:th collocation point. A: = 1... A?”, on the surface of the 
sphere of radius r^. Assume that the {9k, (j)k) are the abscissas of a quadrature rule 
on a sphere such that the rule exactly integrates spherical polynomials of degree at 
least 2M. Let Wk,k = 1... N denote the quadrature rule weights corresponding to a 
sphere of radius unity. 

At a collocation point, the error in the potential, <j)e{'<'ciQki<j>k) = 4‘{f’c,0ki4>k) — 
4>g{fc, ^ki <Pk) is zero, so we may write 


M I 

E E 

i=0 m=—l ' c 
00 




gi,mTlm{^ki 'fk) — 


^ 1 

in Y 


1 


l=M+l m=—l 


2l-\-l 


/. Ng 

r''pix’)Y,i,{ff, Mdx' - 2 : Q,Tjy,;(e„ M 

jz=l 


(13) 

i^lm{^ki f^k') 
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for ^ = 1... iV, where qi^m is given by 


Ql,m 


Att 

21 + 1 



Multiplying each side of (13) by WkYy^,{6k,(t>k) and summing over k leads to a 
simplified form. From the identity 

and the quadrature rule for selecting Wk,0k,4^k-i it then follows that 

WkYi*^,{ek, ^k)yim{0k, <pk) = SiirSmm' 

k 

for I + V <= 2M. Therefore, 


1 


(14) 


N oo I 1 1 r /■ 

fe=l l=M+l m=-l yS{a} 

yi*m'(^k, (l>k)Ylm{dk, (Pk) 


Ng 

1=1 


The addition theorem for spherical harmonics states 


47r 


21 + 


Y E « = -P<(cos 7) 


m=—l 


where 7 is the angle between {O', (p') and {6, cp) and Pi(cos 7) a Legendre polynomial. 
The addition theorem provides a bound 


47r 

21 + 1 


i 


E YU0i.'f>l)Y,.U0k,<l>k) 


< 1 


since |p(cos7)| < 1, as well as a bound on the magnitudes of the spherical harmonics, 

\Yl„.{0,Ak)\ < 


hv +1 


Att 


Since ^ 

/„> E <P,)dx' < Qa‘ 

Js{a) 21 + 1 

and using the additional fact J2k '^k = 47r, we can bound the sum of the infinite series 
on the right-hand side of (14) to obtain a bound on the {I', m!) multipole coefficient 
of the error 
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( 15 ) 




_ oo 

.,„.|<>-;'(Q + QG)(4,r)^(2i' + l)/47r (-)' 


or 


\qi 


•,m'\ < + go)(4x)\/(2i' + l)/47r(f) 


l=M+l 

M+1 

Tc 1 - (a/rc) 


= ?/'• 


(16) 


Using the multipole expansion truncation bound in [2] a bound can be derived for 
the error in the potential, |0e|) 


M - 

k.l < E ;m\/(2'' +1)/“’^ + 


Q + Qg 


1 




f-y 

r l-(a/'r) 


(17) 


After substituting the expression for qi from (16), (17) becomes 


M 






Q + Qg fO- 

) 

r 'T 1 — (a/r) 


• ( 18 ) 


Depending on the relative size of Tc and r, we may obtain two bounds on the 
magnitude of the error. 


< 


Q + Qg 


(M + 1)^ 


l-(a/rc) r 1 - (a/r)J 


rc<r 


(19) 


and 


l</>e| < 


Q + Qg 


o 


M+l 


(M + 1)^ 

l-(«Ac) r l-(a/r)J 


Tr > r. 


( 20 ) 


In the potential evaluation process, the worst-case error will occur at the point of 
smallest r. If we require that Vc > Vm, the lemma is proved. □ 

We now have the main result of the paper. 

Theorem 1. Suppose the potential of a point charge is given by 1/r. The grid- 
based technique for evaluating, outside a sphere of radius rm, the potential of a charge 
density of total charge magnitude Q, located inside a sphere of radius a, has error (/»e 
bounded by 

where 2M is the order of a quadrature rule on a sphere. 


( 21 ) 


Proof. The theorem follows directly from Lemmas 1 and 2. 


□ 


Helmholtz Kernels 

Suppose that outside a sphere of radius a, a function if satisfying the Helmholtz 
equation is represented by a multipole expansion whose moments up to order N vanish 





( 22 ) 


OO I 

il){r,e,(j))= ^ ^ pi^rnh^j^\kr)Yim{0,4>) 

l=N-\-l m=-l 

where k is the wavenumber and = ji{kr)+iyi{kr) is the first-kind Hankel function 
of order 1. 

For such a potential the following lemma exists [7]: 

Lemma 3. For N > ka and any r > a, there exists a c> 0 such that 

|V.(r,«,«|<c(^)''+' (23) 


Theorem 2. Suppose the potential of a charge is given by fr. If the collocation 
points in the grid-charge assignment are chosen to be the abscissas of a quadrature 
rule which exactly integrates spherical harmonics of order < 2ka, i.e., 


M > ka 


(24) 


for a quadrature rule of order 2M, then the grid-based technique for evaluating, outside 
a sphere of radius rm, the potential of a charge density of total charge magnitude Q, 
located inside a sphere of radius a, has error 4>e bounded by 


Q I ® \M+1 


He] < C(1 + «) — ( — ) 
‘ m • 


(M + 1)^ + 1 

1 - (a/rm) 


(25) 


Proof. Given the conditions of Lemma 3, the proof follows exactly as for Theorem 

1 . 


□ 


Applications and competing approaches 


While the grid operators described here were developed with the precorrected- 
FFT technique in mind, they can be incorporated into any multi-level scheme [10, 3]. 
The representation described here has two advantages which allow it to be efficient 
First, because of the regular spacing of the grid charges, fast {0{P)logl, where I 
is the order of the quadrature rule) translation and potential evaluation operators 
exist. It appears that in the approach in [10], only the 0(P) direct operators are 
available. Secondly, the sharing of grid charges between computational cells allows 
for a reduction in the total number of coefficients needed to represent the potential 
in each cell of the computational domain. That is, if there are N cells in the domain, 
and p^ grid charges are used to represent the potential in each cell, then, for large N 
where we may neglect edge effects, the total number of grid charges is only N(p — 1)^, 
a significant reduction for small p. For most engineering problems, we expect p < 5, 
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so the sharing effect will still be significant. An additional advantage of the grid- 
based approach is that the potential throughout the domain can be obtained at little 
additional cost once the panel charges have been determined [21], 

4. COMPLEXITY ANALYSIS 


We first consider the case where the panel charges are evenly distributed 
throughout space. 

Theorem S. For a homogeneous distribution of N panels, the precorrected-FFT 
method requires 0{N log N) operations to perform a potential calculation. 

Proof. Assume space has been divided into an array oi M x M x M cells, and 
that there are about N = n^ panels evenly distributed throughout the M x M x M 
cube, so that there are about (n/M)^ panels in each computational cell. Finally, 
assume that the grid in each cell is a p x p x p array. There are three components 
in the cost of the precorrected-FFT method. We assume that any costs associated 
with forming the grid projection operators are negligible, since these calculations only 
need be performed once, not at each GMRES iteration. 


• Cost of direct interactions 


Cd — a 




» Cost of grid projection and interpolation 

Cj = 'yM^{^)Y = jny 
which is independent of M. 

• Cost of the FFT 

Cp = pp^M^ log2 Mp 


If we assume that M is proportional to n, then the total cost of the algorithm is 
0{n^ + n^ log2 n) = 0(N log 2 N). □ 


For the boundary-integral methods considered in this paper, however, the panels 
are usually not homogeneously distributed. 

Theorem f. For a single closed surface at fixed k the precorrected-FFT method 
requires log N) operations to perform a potential calculation, where N is the 

number of panels. 

Proof. Again assume space has been divided into an array oiMxMxM cells, and 
that the surface measures about n panels wide along each side of the MxMxM cube, 
so that there are about N panels total, and {n/Mp panels in each computational 
cell which is occupied. About cells will have panels. To determine the complexity 
of the method, the optimal number of cells M must be determined as a function of 
problem size, n. The analysis proceeds as above: 


682 



® Cost of direct interactions 




® Cost of grid projection and interpolation 

Ci = jM^{^)V = jnV 

which is independent of M. 

« Cost of the FFT 

Cf = log2 Mp 


Neglecting for the purposes of optimization the logarithmic factor, the total cost 

^4 

Cd = + 'yn^p^ + (3p^M^ 

which when optimized with respect to M gives 


so that 


M = 


2an^\ 

- 77 , 

3^p^) 


Cd oc 71^2/5 = 

Ci(xn‘^ = 0{N) 

Cp oc log2 np = log2 Np) 


□ 


In this analysis, we have assumed that p is constant. For a given problem, when 
solving the Helmholtz discretization as the frequency increases, generally the number 
of panels must increase to retain a fixed number of panels per wavelength. However, 
the size of a computational cell decreases proportional to 1/M, or as slower than 
n. Thus, for high frequencies the criterion in (24) that the order of the quadrature 
rule be greater than 2A:A will be violated. We must allow p to vary with n tp obtain 
the correct complexity analysis, which gives a different complexity bound. 

Theorem 5. For a single closed surface the precorrected-FFT method with a/ZV 
proportional to k requires at most log N) operations to perform a potential 

calculation, where N is the number of panels. 

Proof. Assume the size R of the computational domain is fixed. Further, assume 
a fixed number of panels per wavelength, n 1/A ~ A: is required to maintain the 
solution accuracy. Then kA = kR/M ~ n/M. The number of collocation points 
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necessary for order I quadrature is 0{p), which is of the same order as the number 
of grid charges per cell, Thus we have 


p ~ 



Repeating the above complexity analysis, we have 


• Direct cost Cp = 0{n‘^/M‘^) 

• Interpolation cost Ci — 0{p^'n?) = 0{n^/IVP), same order as the direct cost 

• FFT cost Cf = 0{MY log2 Mp) = 0{Mn^) 

The total cost is thus Ct = 0{Mn^ + /M‘^) which when optimized for M gives 

M = 0(^2/^) 


The asymptotic cost of the entire algorithm is then log 2 N), a slight 

increase over the in the case of Poisson’s equation, and competitive with 

two-level multipole based schemes for the Helmholtz equation [7]. 

We should also note that the cost of forming the grid projection operators, 
0{p^) = O(n^) = 0{N) remains reasonable. □ 


5. COMPUTATIONAL RESULTS 


Empirical Grid Error Analysis 

In Figures 3(a) and 3(b), the errors in the potential due to the grid charge 
approximation are shown for two values of the collocation sphere radius rg, in the 
Laplace {k = 0) limit. In Figure 3(a), with Tc small, for all orders of approximation 
the error decays slowly away from the charge distribution. Since in this case Tc ci Tmin, 
we expect the error to behave essentially as a monopole, dying slowly away from 
the origin, regardless of the order of the quadrature rule. We only expect the 
order of quadrature rule to change the constant factor in front of the error term. 
Notice in Figure 3(b), where Tc is considerably larger, the worst-case errors have not 
changed much, as predicted by our previous analysis. The variation of error with 
distance, however, changes drastically. As the collocation sphere radius is increased, 
the magnitude of the low order multipole coefficients of the error decreases, and the 
errors decay rapidly with distance. Note that the sharp error decay associated with 
high order multipole approximation ends at about the collocation sphere radius. 

In Figures 3(c) and 3(d), we consider errors in the Helmholtz equation. At low 
A:, all three order schemes considered still exhibit acceptable error properties (if an 
acceptable worst-case error is of order 10~^ — 10~^). As k is increased, however, the 
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(b)rc = 6d,k = 0 



Figure 3: Error in grid approximation of potential of 100 charges of random strength 
Q € [0,1] located at random positions inside a cube of side length 2d centered at 
the origin. Collocation sphere radius is = 1.5d (left figure), Tc = 6d (right figure). 
Solid line: p = 3, order 7 quadrature rule. Dash line: p = 4, order 11 quadrature 
rule. Dash-dotted fine: p = 5, order 14 quadrature rule. 


low-order schemes become inaccurate, and the high-order scheme (p = 5) becomes less 
accurate, though still retains acceptable accuracy for this relatively high frequency 
(at this freqency, the basic computational cell is more than a wavelength long). 


Computational Examples 


First we analyze the behavior of the precorrected-FFT method as a function 
of problem size, for Laplace and Helmholtz kernels. A cube is discretized into 
quadrilateral panels, with n panels along each size. The time required to perform 
a matrix-vector product, and the memory necessary for the linear system solution, 
is then tabulated for n ranging from 15 to 100. For the Helmholtz problem, we will 
require that the discretization have 15 panels per wavelength along each side of the 
cube. For a urdt cell of length A, the order p of the grid representation and order M of 
the quadrature rule are chosen by the rules: kA < 1.75 corresponds to p = 3, M = 7 
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Memory use (Mb) 



number of panels 


Figure 4: CPU time and memory use for discretized cube, x: Laplace problems. *: 
Helmholtz problems, with kn = 15, n the number of panels along a side of the cube. 
Dash line; best fit line to Laplace data: assumed time, memory = Cn°‘, computed 
a. = 1.16 for CPU time, a = 1.11 for memory use. 


(26-point rule); 1.75 < kiS. < 2.75 corresponds to P = 4,M = 11 (56-point rule); 
kh > 2.75 corresponds to p = 5,M = 14 (72-point rule). The results are shown in 
Fig. 4. 

The results for Laplace’s equation follow the expected behavior very 

closely. Some degree of irregular growth is apparent in the plot as a result of changing 
grid levels. The cost of the precorrected-FFT method is generally greater when the 
Helmholtz kernel is used, in part because complex quantities must be manipulated, 
but mostly because a higher-order grid representation is necessary to accurately 
represent the charge in a cell. For the range of frequencies considered, the problems 
with a Helmholtz kernel appear to be roughly a factor of 2 — 10 slower than the 
problems with a Laplace kernel. The growth with problem size of computation time 
and memory usage seems to be fairly irregular, for the choice of grids considered 
here. The observed irregularity occurs because the order of the approximation must 
change to maintain a fixed relationship between the wavelength and the size of a 
computational cell. 

Now we demonstrate that the precorrected-FFT technique can accurately compute 
solutions of integral equations with an oscillatory kernel. Assume a sphere of radius 
a, with the boundary conditions 

u{x) = (ka) sin^ 9 cos 9 cos 2(1) 

which has solution ij)(r,9,(l)) = h^^\kr) sin^ 9 cos 9 cos 2(p. The sphere was discretized 
along longitudes and latitudes, with 50 divisions in each variable, to generate a 
problem with 2600 panels. We take k = 47r, corresponding to a sphere 4 wavelengths 
in diameter. Fig. 5 shows the computed results. The agreement is excellent, and 
closer inspection shows the error in the computed fields to be less than 10“^, on the 
order of the GMRES tolerance. We have encountered no computational difficulties 
at much smaller or moderately larger wavelengths. 
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Figure 5: Solid line: real part of exact solution. Dashed Une: imaginary part of exact 
solution, x: computed real part of solution. +: computed imaginary part of solution. 

6 . CONCLUSIONS 


In this paper we described and carefully analyzed a coUocation-grid-projection plus 
precorrected-FFT method for solving potential integral equations with ^ and 
kernels for a wide range of k. We demonstrated experimentally and analytically that 
the errors are well-controUed, and showed that the method is competitive with fast- 
multipole algorithms for ^ kernels but is much more general. It should be noted that 
the collocation-grid-projection plus precorrected-FFT method can be combined with 
the multilevel methods in [3] to minimize the effects of inhomogeneity, but we have 
yet to see the need for such an approach in practical applications. 
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SUMMARY 

A multigrid method for the solution of finite difference approximations of elliptic PDEs 
is introduced. A parallelizable version of it, suitable for two and multi level analysis, is 
also defined, and serves as a theoretical tool for deriving a suitable implementation for the 
main version. For indefinite Helmholtz equations, this analysis provides a suitable mesh size 
for the coarsest grid used. Numerical experiments show that the method is applicable to 
diffusion equations with discontinuous coefficients and highly indefinite Helmholtz equations. 


1 INTRODUCTION 

The multigrid method is a powerful tool for the numerical solution of elliptic PDEs [4]. 
Its rate of convergence, however, deteriorates when non-elliptic problems are encountered; 
this phenomenon is due to error components (modes, eigenvectors) which have nearly zero 
eigenvalues with respect to the coefficient matrix. For convection problems, for example, 
error modes which are smooth in the convection direction are nearly singular and require 
a special treatment [6] [7]. For indefinite equations, we distinguish two classes of problems: 
(a) slightly indefinite problems, for which very few modes with negative eigenvalues (say two 
or three) exist, and (b) highly indefinite problems, for which many more such modes exist. 
For class (a), the method of [5], which is based on filtering nearly singular modes, achieves 
convergence rates which are close to those for the Poisson equation. The Cyclic Reduction 
Multigrid (CR-MG) of [8] is also superior to standard multigrid. For class (b), a projection 
method (suitable for finite element schemes) is presented in [3]. The AutoMUG method of 
[16] [17] [18] and a variant of Black Box Multigrid [15] also achieve satisfactory convergence 
rates especially when supplemented with an acceleration scheme. The two latter methods 
can also handle diffusion problems with discontinuous coefficients. 

The aim of this wOrk is to supply a suitable implementation for AutoMUG for highly 
indefinite Helmholtz equations. To this end, we introduce a parallelizable version of Auto¬ 
MUG, called Parallelizable AutoMUG (PAMUG). This method may be considered a gener¬ 
alization of the Parallelizable Superconvergent Multigrid (PSMG) of [11] to nonsymmetric 
and indefinite problems. PAMUG uses the fine grid at all levels, hence is suitable for par¬ 
allel architectures with a large number of processors; however, we do not use it as a solver 
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but only as a theoretical tool supplying a suitable implementation for AutoMUG. Due to 
its simple algebraic formulation, PAMUG is suitable for two-level analysis in some cases. 
Furthermore, in some model cases, including indefinite Helmholtz equations, the spectrum 
of the multi level iteration matrix is computable. This enables one to choose in advance a 
suitable mesh-size for the coarsest grid and a suitable acceleration scheme (if needed). Due 
to the similarity of AutoMUG and PAMUG, this implementation applies also to AutoMUG, 
as follows from numerical experiments. 

The content of this paper is as follows. In Section 2 AutoMUG and PAMUG are defined. 
In Section 3 they are analyzed. In Section 4 numerical experiments (using AutoMUG) are 
reported. 

2 THE AutoMUG AND PAMUG METHODS 
2.1 Abstract Definition of a Multi Level Method 

We start with an abstract definition of a multi level (ML) method for the solution of the 
linear system of equations 

Ax = b. 

In the following, S : x —¥ Sx is smoothing procedure and e, r, t and o are nonnegative 
integers denoting, respectively, the cycle index, the number of presmoothings, the number of 
postsmoothings and the minimal bandwidth of A (with some ordering of variables) for which 
ML is called recursively. The operators R (restriction), P (prolongation) and Q (coarse grid 
coefficient matrix) will be defined later. 

ML(2Jjjj, A, 6, X(juj) . 

if A is of bandwidth < o 
for some variable ordering 

^out ^ A b 

otherwise: 

^in ^ Sxin (repeat r times). 

e ^ 0 (1) 

ML(e, Q, 

6 ^ Gout 
^out ^ ^in 

Xout Sxout (repeat t times). 

An iterative application of ML is given by 

Xo = 0, A: = 0 

while \\Axk — b \\2 > threshold • ||Aa:o — &II2 

ML{xk,A,b,Xk+i) (2) 

A: ■<— A: -I- 1 
end while. 


R{AxiYi b'j^ecmt) 


-Pe 


repeat e times 
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Below we define the operators R, P and Q of (1) for AutoMUG, its variant AutoMUG(g) 
and the parallelizable versions PAMUG and PAMUG(9). 


2,2 Some Matrix Functions 


Let iiC be a positive integer and / the identity matrix of order K. For any matrix M, 
M = {mij)i<ij<K, define the matrix functions 


rowsum{M) 

D{M) 

R{M) 

Q{M) 

P{M) 

S{M) 


K 

= diag{Y,mi^j)i<i<K 

= diag{M) 

= 2I-MD{M)-^ 

= R{M)M 
= 2I-D{M)-^M 
= rowsum{P{M)). 


These definitions apply to AutoMUG and PAMUG. For AutoMUG(9) and PAMUG(g), 
replace the above definition of S{M) by S{M) = (2 + q)I (the role of the parameter q 
will be explained later). Let Vk be the space of the K x A'-grid functions (it is assumed 
hereafter that the first point in a grid is numbered (1,1)). Define the orthogonal projection 
O :Vk y\_K/ 2 \ by {Ov)ij = V 2 i, 2 j and the permutation U by 


iUv)i,j = Vj,i, veVK. 


For any matrix B, we say that B is a if-block matrix if B is block diagonal with tridiagonal 
blocks of order K, that is, 

B = blockdiag{B^^^)i<j<K, 


with 

B^^'> = tridiag{h^i\c^i\ 1<3<K- 


By the notation Hridiag’ we mean a periodically extended tridiagonal matrix, that is, bi^ = 
b[^)^ and We assume that either 


b^'> = = 0, l<j<K 

ov K = 2’^ for some positive integer k. This guarantees that A and the coarse grid coefficient 
matrices defined bellow are of property-A. Actually, the block submatrices B^A need not be 
of the same size; for simplicity, however, we assume that they are. Non-rectangular grids 
can be embedded into rectangular ones (see [9] [18]). 


2,3 Transfer and Coarse Grid Operators 

Here we define the operators R, P and Q used in (1) for linear systems which arise, for 
example, from finite difference approximations of elliptic PDFs. 
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Let N and n be positive integers, where n < [log2 A^J denotes the number of levels minus 
1. Assume that A is of the form 

A = X + F, (3) 

where X and UYU are A^-block matrices. For example, if 

dh? 

X = UYU = blockdiag[tridiag{~l, 2-—, —1)], (4) 

(where is a parameter and h is the cell size) then A represents a five-point second order 
discretization of the Helmholtz equation 


'^xx '^yy f (^) 

in a square (the unit square is used here). 

Define Xq = X and Fq = Y. For i = 1,..., n, define the matrices Ri, Pi and Aj, in this 
order, by 



= 5(F_i)Q(X,_x) 

F 

= 5(Ax_i)g(F_i) 

Ri 

= OR(F_i)R(AVi) 

Pi 

= P(Ai_i)P(F_i)0' 

Ai 

= 0{Xi + Yi)0^. 


These definitions apply to AutoMUG and AutoMUG(g). For the parallelizable versions PA- 
MUG and PAMUG(5), they are modified as follows: omit the operators O and in the 
above definitions and replace the definition of Pi by Pi = I. The parameter q in AutoMUG (9) 
and PAMUG(g) is chosen by the user such that S{Xi-i) and S'(Fi_i) are optimally approxi¬ 
mated, in some sense, by {2 + q)P, for example, if p in (5) varies with the spatial coordinates, 
then a reasonable choice for q is an average value of —Ph?/A. PAMUG(g) and AutoMUG(q') 
are suitable for two-level analysis. For simplicity, q = 0 is used in most of this analysis. 

The ML procedure, namely ML{xin,A,b,Xout) defined in (1), is called n + 1 times per 
iteration. In the (n -I- l)st time, it is a direct solver. In order to implement AutoMUG, 
AutoMUG(q'), PAMUG or PAMUG(g), the zth call to the ML procedure, I < i < n, uses 
the operators 

Q i — Aj, R i — Ri and P i — Pi. 

Note that, for PAMUG and PAMUG(^), Ai includes four independent subsystems, each of 
which corresponds to odd (even) numbered variables in the x and y spatial directions (see 
[18]). Furthermore, the coarse grid equations in PAMUG and PAMUG(9) corresponding 
to even numbered variables in both spatial directions are identical to those of AutoMUG 
and AutoMUG(9), respectively. Roughly speaking, these methods have a similar effect on 
low frequency error components, hence it is likely that convergence rate estimates for the 
parallelizable versions are fair approximations to those for the sequential ones. This is veri¬ 
fied in Corollary 1 and the numerical experiments in Section 4. For certain examples, e.g., 
convection-diffusion equations with periodic boundary conditions, AutoMUG and PAMUG 
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are equivalent to AutoMUG(O) and PAMUG(O), respectively, because all the row-sums used 
in AutoMUG and PAMUG are equal to the constant number 2 (as a matter of fact, Auto- 
MUG is equivalent to AutoMUG (0) also for other types of boundary conditions, provided 
that N is odd). This is also the case for either definite or indefinite Helmholtz equations, 
provided that an appropriate q^Ois used. Hence, one can learn about the features of Auto¬ 
MUG (which is actually used in our applications) from the analysis of PAMUG, PAMUG(g) 
and AutoMUG (g). 


3 ANALYSIS OF PAMUG AND AUTOMUG 


3.1 Two-Level Analysis 

Here we derive upper bounds for convergence rates for PAMUG(O) and AutoMUG(O) applied 
to a class of equations, including Symmetric Positive Definite (SPD) Helmholtz equations 
(e.g., ph?/2 < Asiv?{'Kh/2) in (4)). These bounds are independent of the size of the problem 
and the clustering of the eigenvalues near zero. This implies that AutoMUG is capable of 
handling nearly singular eigenvalues; hence, it may solve highly indefinite problems, provided 
that the negative eigenvalues are handled by a suitable acceleration scheme (see also Section 
3.3). 

Since PAMUG is designed for parallel implementations, it may be assumed that the 
damped Jacobi iteration, which is perfectly parallelizable, is used as a smoothing procedure 
(for some architectures, two damped Jacobi relaxations are less expensive than one red-black 
Gauss-Seidel sweep). This simplifies the analysis considerably. 

The order in which smoothing and coarse grid correcting are performed is immaterial, due 
to the commutativity of the smoothing and coarse-grid correcting operators. For consistency, 
however, we consider damped Jacobi iterations for presmoothing and other methods (e.g., 
Jacobi) for postsmoothing. 

Theorem 1 Assume that 


• X and Y commute with each other. 

• D{X) = D{Y) = I (isotropy assumption). 

• the spectra of X and Y lie in the interval (0,2) (e.g., X and Y are symmetric M- 
matrices or symmetric irreducibly diagonally dominant matrices, see [20]). 


Then the convergence factor for a two-level implementation of PAMUG(0) with r damped 
Jacobi presmoothings (with damping factor 2/3j and no postsmoothings is bounded from 
above by 


max 


3r^ 

4(r -I-1)'’+^ 


A 1 

4e r 


( 6 ) 
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For the proof, see Appendix A. 

Corollary 1 Assume that A is normal Then Theorem 1, with the bound in (6) multiplied 
by 2, applies also to AutoMUG(O), provided that one additional postsmoothing of the form 
X -f- POx is performed. 

For the proof, see Appendix B. 


3.2 Multi-Level Analysis for PAMUG 


Theorem 1 yields convergence rates for the two-level implementation of PAMUG (0) to es¬ 
sentially semi positive definite problems. This implies that indefinite problems may also 
be solved, provided that the negative eigenvalues are handled efficiently by an acceleration 
scheme. In this section, we give quantitative support for this heuristic. 

Theorem 2 Assume that the blocks in X and UYU are circulant Toeplitz matrices, that is, 

X = blockdiag[tridiag{bo,co,do)] 

UYU = blockdiag[tridiag{/3o,^o,do)] 


for some constants bo, Cq, do, fdo, 7o and 5o- Let 

Pq 


&o + Co -f do ^0 + To + <5o 

Qo = - 


co 


To 


For 0 < i < n — 1, define 


bi+i = -(2 - 9j)6-/ci Ci+i = (2 - qi){ci - 2bidi/ci) 
di+i = -(2 - qi)(ii/ci Pi+i = {bi+i + Ci+i + di+i)/ci+i 
I3i+i = -(2 - Pi)l3f/ji 7i+i = (2 - Pi) (7i - 2^ibihi) 

^i+i = —(2 — Pi)b‘i/^i qi+i = (A+i + 7i+i + <^i+i)/7i+i- 


Define 


g{c, r,p, q-,x,y) 


{2-x/c){2-y/j){x + y) 

(2 - q)x{2 - x/c) + (2 - p)y{2 - yfi) 
( x-\-y V 

/r(c, 7 ; p, q\ y) = ^(c, 7; p, 9; x, y) (1 - 

f!f^-'^\x,y) = fr{Cn-u^n-uPn-Uqn-UX,y). 

For z = n — 2, n — 3,..., 0, define 

/W(x,2/) = fr{ci,^i-,pi,qi\x,y) 

+ - qi)x{2 - x/ci), (2 - pi)y{2 - y/'yi)) 

x+y y 


{1- g{ci,'yi-,Pi,qi-,x,y)) 1 - 


a{ci + ji)^ 
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Then there exists an orthogonal matrix T such that the iteration matrix of PAMUG (im¬ 
plemented with cycle index e and r damped Jacobi smoothings with damping factor a~^) is 
given by 

T diag{fj: ^{p(^jy')}{x,y)^spect{X)y.spect{Y)T. 

For the proof, see Appendix C. Theorem 2 yields an efficient way to compute in advance the 
spectrum of the iteration matrix of PAMUG. This method is employed below for our model 
problem. 


3.3 The Indefinite Helmholtz Equation 

As discussed in [5], the most problematic eigenvalues of indefinite equations are those which 
are close to zero. Theorem 1 and Corollary 1 show that PAMUG (0) and AutoMUG(O) 
handle positive eigenvalues arbitrarily close to zero, giving convergence factors which are 
independent of the size of the problem and the clustering of the eigenvalues. Although this 
applies to the two-level method and definite problems, it indicates that the algorithm may 
also be efficient for the multi level method and indefinite problems. In this case, however, 
the cell-size of the coarsest grid cannot be arbitrarily large, as is shown below. 

When the coarsest grid is not too coarse, numerical computations using Theorem 2 show 
that the PAMUG iteration matrix has only a few eigenvalues of magnitude larger than one. 
These eigenvalues may be annihilated (their corresponding error components are significantly 
reduced) by an appropriate Krylov space acceleration method applied to the basic multi level 
iteration (2). The remaining eigenvalues are considerably smaller (in magnitude) than one; 
good convergence rates are thus achievable, provided that the dimension of the Krylov space 
is large enough, say twice as large as the number of eigenvalues of magnitude greater than 
one. When the number of levels is large, so that very coarse grids are used, the spectrum 
of the iteration matrix significantly deteriorates; the magnitude of many eigenvalues then 
approaches one and exceeds it. 

Thus, Theorem 2 may help in choosing in advance an appropriate dimension for the 
Krylov space in the acceleration method. For highly indefinite problems, however, this di¬ 
mension must be rather large; in this case, a conventional acceleration method, such as 
GMRES of [14], will not do, since the required amount of storage (respectively, arithmetical 
operations) increases linearly (respectively, quadratically) with the dimension of the Krylov 
space used. The Transpose Free Quasi Minimal Residual method of [12] and the Conju¬ 
gate Gradient Squared method of [19], which use arbitrarily large Krylov spaces with fixed 
requirements of work and storage, are thus preferable. 

Consider the indefinite Helmholtz equation (5) in the unit square with periodic boundary 
conditions, discretized as in (3), (4). Our aim is to compute the spectrum of the PAMUG 
iteration matrix for this problem. In this case, 

spect{X) = spect{Y) = {Asin^inj/N) - 

Modes which are constant in either one of the spatial directions are excluded; this is equiv¬ 
alent to assuming that the right hand side includes no Fourier modes which are constant in 
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one of the spatial directions, and the equation is projected onto the linear subspace orthog¬ 
onal to the set of these modes. This situation simulates problems with Dirichlet boundary- 
conditions, since the spectrum of X and Y is not enlarged by the transformation 

periodic boundary conditions -> Dirichlet boundary conditions 
N ^ N/2-1 

13 ^ (3/4. 

One damped Jacobi smoothing (with damping factor 1/2) and two Jacobi smoothings are 
used in each level of a V-cycle. This implementation is chosen in order to cancel possible 
poles of the function g of Theorem 2 (and the proof of Theorem 1) and guarantee that the 
functions there are bounded. Indeed, it is verified that no pole of the functions is 
encountered during the computation. This choice was the most efficient one; using, e.g., 
damping factor 1/2 for all the three relaxations yields worse results. This is another place 
where the theory helps in choosing a suitable implementation; however, it is suitable only 
for ideal parallel machines, whereas in practice (Section 4) we use AutoMUG with the more 
efficient red-black Gauss-Seidel relaxation. 

The results are displayed in Figures 1 and 2. The last rows of these figures show how the 
spectrum deteriorates when the coarsest grid is too coarse. Here ^ = 3200 and we find that 
for N = 256 and 512, respectively, using 3 and 4 levels yields only a few large eigenvalues. 
The remaining eigenvalues are contained in [—0.25,0.25], which implies that the effective rate 
of convergence should be around 0.25, provided that the large eigenvalues can be handled by 
the acceleration. Consequently, a 64 x 64 coarsest grid is suitable for achieving this rate of 
convergence. In light of the above discussion, it is expected that for Dirichlet problems and 
(3 — 800 the choices A" = 127 and N = 255 yield pictures which are much the same as those 
of Figures 1 and 2, respectively; hence a 31 x 31 coarsest grid is suitable in this case. When 
a further coarser grid, namely, a 15 x 15 grid, is used, the eigenvalues of the iteration matrix 
are clustered around ±0.7; thus, a convergence factor of at least 0.7 is expected in this case 
(see Table 2 below). It can also be inferred from the figures that the number of levels is 
immaterial; what matters is the cell-size of the coarsest grid alone. This is in agreement 
with a result of [3] (see also Table 1 below). 

There is also a physical explanation for the above lower bound on the resolution of the 
coarsest grid. For Equation (5), consider waves of wave number [k, 1) satisfying 7r‘^ + « 

p. Evidently, these waves appear in the solution, since they are amplified by the inverse of 
the operator. Hence, an appropriate coarse grid must be capable of approximating these 
modes. In particular, it should be sufficiently fine to approximate the above modes with 
A: = 0 (resp., A: = 1) and I = 0 (resp., I = 1) for periodic (resp., Dirichlet) boundary 
conditions. In light of the Nyquist rate, a proper approximation requires 2 points per wave 
length; this yields roughly [N/2'^\ > 2\fPl'n. 

Another explanation for the above restriction arises from matrix theory. It was observed 
that for sufficiently fine grids, the coefficient matrix is an L-matrix, that is, has positive main 
diagonal elements and nonpositive off-diagonal elements. For too coarse grids the amount of 
indefiniteness is so large that the main diagonal elements become negative, which leads to 
an inappropriate PDE approximation. 
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Figure 1: Eigenvalues (of magnitude > 0.25) of the PAMUG iteration matrix for the indefi¬ 
nite Helmholtz equation with P = 3200, N = 256 and periodic boundary conditions. 
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- ^ b“i- 1 ^ ib ib 


Figure 2: Eigenvalues (of magnitude > 0.25) of the PAMUG iteration matrix for the indefi¬ 
nite Helmholtz equation with /3 = 3200, N = 512 and periodic boundary conditions. 
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4 


NUMERICAL EXPERIMENTS 


4.1 A Comparison of Various Multigrid Methods 


We apply AutoMUG and several other multigrid algorithms to the problem 
-Ua;x - Uyy - 800u = /, {x, y) E Q, = (0,1) X (0, 1), 
with complex boundary conditions of the third kind 


^ + lOiu = g {x,y) er C dn 

(where n is the outer normal vector) and Dirichlet boundary conditions on d^l \ P. We 
consider the following cases: 

(a) r = 0 

(b) r = {o}x[o.il. 

The equation is discretized via a second-order five-point difference scheme (as in (3)-(4)). 
Uniform N x N grids are used. The exact solution is u = xy. The initial guess is random in 
( 0 , 1 ). 

To the basic multi-level iteration (2), we apply the Transpose Free Quasi Minimal Resid¬ 
ual (TFQMR) acceleration method (Algorithm 5.2 in [12]), which avoids the computation 
of the transpose of the coeflScient matrix and preconditioner (the latter is only implicitly 
given in (1), so its transpose is not available). TFQMR may be considered a modification 
of the Conjugate Gradient Squared (CGS) method of [19]. The costs of these acceleration 
techniques are comparable to that of the Conjugate Gradient method, that is, about 1-1.5 
work units per iteration. We found that the performance of CGS and TFQMR is similar; 
we preferred the latter, though, because of its smooth convergence curve. 

The multi level methods are implemented with the red-black Gauss-Seidel (RB) smoother 
in a V(l,l)-cycle. The coarsest level equation is solved with six orders of magnitude accuracy. 

We define the following measures of efficiency: the convergence factor 

cf = ~ 

6II2 


and the averaged convergence factor 


avcf = 


/ \\Axiast - 5 || 2 \ 

V ||Aa:o-5||2 J 


where last is the smallest positive integer for which 

^ threshold 

||Aa;o - 5||2 

and threshold is about 10~®. When acceleration is used, the convergence factor often oscil¬ 
lates; hence, for the highly indefinite examples, only avcf is reported. 

AutoMUG is compared to 3 other multigrid methods which share the same complexity 
(that is, use 5-coefRcient stencils at all levels): 
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1. Standard Multigrid (MG): coarse grid operators are derived from rediscretizations of 
the differential equation; full-weighting and bilinear interpolation are used for restric¬ 
tion and prolongation, respectively. 

2. Cyclic Reduction Multigrid (CR-MG) [8]: coarse grid operators, restriction and pro¬ 
longation are defined as in [8]. 

3. Full CR-MG (F-CR-MG): coarse grid operators are generated from [8]; full-weighting 
and bilinear interpolation are used for restriction and prolongation, respectively. 

The results are displayed in Tables 1 and 2. 


Table 1: Averaged convergence factors (avcf) for various multigrid methods (with TFQMR 
acceleration). The results show that once the resolution of the coarsest grid is fixed, the rate 
of convergence is independent of the number of levels.__ 


N 

levels 

■a 

MG 

F-CR-MG 

CR-MG 

AutoMUG 


4 

(a) 

.540 

.267 

.614 

.277 

mm 

3 

(a) 

.549 

.272 

.506 

.280 

63 

2 

(a) 

.561 

.273 

.404 

.312 

63 

2 

(b) 

.651 

.694 

.748 

.396 


Table 2: Averaged convergence factors (with TFQMR acceleration) showing the deterioration 
of convergence rates when the resolution of the coarsest grid is too coarse. 


N 

levels 

■■ 

MG 

F-CR-MG 

CR-MG 

AutoMUG 

63 

2 

(a) 

.561 

.273 

.404 

.312 

63 

3 

(a) 

> .9 

.771 

> .95 

.737 


Remark: it was also found that for diffusion problems with discontinuous coefficients (e.g.. 
Examples 7 and 9 in [18]) MG and both variants of MG-CR stagnate. 


4.2 Problems with Discontinuous Coefficients 


AutoMUG and two variants of Black Box Multigrid are applied to problems of the form 


with 


—V(DVm) — au = f in 0 = (0, 0 ( 2 ) x (0, a; 2 ), 


i(^) 

D{x,y) 


I 0 0 < t < Wi 

1 1 UJi <t < (jJ-l 

{ dj- (x, y) G fl, 
db {x, y) e Q., 

do 


j{x) +j{y) mod 2 = 0 
j{x)+j{y) mod 2 = 1 
{x,y) ^ n 


} 
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a{x,y) = 


f{x,y) = 


0^ y 

{x,y) e n. 

j{x) +jiy) mod 2 = ( 

0-6 

{x,y) 6 n. 

j{x) +j{y) mod 2 = ; 

CTo 


{x, y) 

0 

{x,y) e Q, 

j{x) + j(y) mod 2 = 0 

1 

{x,y) e fl. 

j{^) +i(y) mod 2 = 1 

0 


(x,y) 0 0 


and mixed boundary conditions of the form 


Dun + Jqu = 0 X = 0 ov y = 0 
DUn + 7iU = 0 X = U)2 ov y = 


(where Wi, U 2 ., 7 o, 7 i, dr, d^, dg, dr, db and dg are parameters). The finite volume discretiza¬ 
tion of [2] is used. However, since it results in a strong coupling between domains which 
are only weakly coupled in the PDE and, hence, in an inadequate scheme (see [2]), it is not 
applied to the original but to the modified problem —V{DVu) — du = f, where 


e 

5 

D{x,y) 

d{x,y) 


dr T db 
2 

dr + db 


mm{dr/db, db/dr) 
mm{dr/db, db/dr) 


2 

e 

D{x,y) 

S m.cvyi{\x — u}i\,\y — u)i\) < h/2 
d{x,y) otherwise. 


\x — uji \ + \y — Ui\ < h 
otherwise 


A uniform 63 x 63 fine grid is used (the only exception to this are Examples (12)-(13) in 
Table 3 representing the ‘staircase’ problem of [2], where a uniform 17 x 17 fine grid is used). 
When Dirichlet boundary conditions are imposed, it is denoted by 70 = 71 = oo. In this 
case, no grid point lies on dO,; all equations are non-trivial. The initial guess is zero. 

The results in Table 3 correspond to the following methods: (A) AutoMUG; (B) Black 
Box Multigrid [9]; and (C) the second method in [10]. For Examples (l)-(ll), these methods 
were implemented with coarse grids consisting of even numbered variables of the next finer 
grid (similar results, however, were obtained when odd numbered variables were used for 
this purpose). The off diagonal row-sum modification introduced in [10] is not used, since 
(apart from Examples (12)-(13)) coarse grids do not include boundary points of the next 
finer grid (see [15]). Also, prolongation is done without using the right hand side, since it 
was found in [15] that this does not improve the convergence for indefinite problems. 

The multigrid cycle is implemented as in the previous subsection. For methods (B) and 
(C), however, since 9-coefficient stencils are used, RB is replaced by the four-color ordering of 
[1]. Acceleration is used only for highly indefinite problems, namely, when max(crr, db) > 100. 

A comparison of Examples (1) and (2) of Table 3 shows that, as implied by Corollary 1, 
AutoMUG (with no acceleration) performs for nearly singular Helmholtz equations almost as 
well as for the Poisson equation. For more highly indefinite problems, however, acceleration 
must be used. 
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Table 3: Three multigrid methods, (A) AutoMUG, (B) Black Box Multigrid and (C) the 
second method of Dendy (87), applied to definite and indefinite problems with discontinuous 
coefficients. Uniform 63 x 63 (resp., 17 x 17) fine grids are used for Examples (l)-(ll) (resp., 
(12)-(13), the ‘staircase’ problem). 

Description of examples 


example 

OJl 

i02 

7o 

7i 

dj’ 

dfj 

do 

0^f 

Ob 

Oo 

acceleration 

(1) 


1 

oo 

oo 

1 

1 

1 

0 

0 

0 

no 

(2) 


1 

oo 

00 

1 

1 

1 

20 

20 

20 

no 

(3) 


1 

oo 

00 

1 

1 

1 

400 

400 

400 

yes 

(4) 


D 

m 

lOi 

1 

1 

0 

400 

400 

0 

yes 

(5) 

30/62 

P 

lOi 

lOi 

1 

1 

0 

0 

400 

0 

yes 

(6) 

31/62 

P 

lOz 

lOz 

1 

1 

0 

0 

400 

0 

yes 

(7) 

30/62 

P 

m 

lOz 

1000 

1 

0 

0 

400 

0 

yes 

(8) 

31/62 

II 

m 

lOz 

1000 

1 

0 

0 

400 

0 

yes 

— 

MM 


0 

m 

1 

H 

0 

0 


0 

no 

MM 


M 

0 

EB 


P 

0 

0 


0 

no 

HM 

Mi 

62 

0 

0.5 

IBiiiii 

H 

0 

0 


0 

no 


Numerical results 





cf 



avcf 


example 

levels 

A 

B 

C 

A 

B 

C 

(1) 

4 



.159 

nBin 

.072 

.184 

(2) 

4 


.431 

> 1 

■IBll 

.507 

> 1 

(3) 

3 




.336 

.702 

.835 

(4) 

3 




.329 

.335 

.567 

(5) 

3 




.369 

.315 

.516 

(6) 

3 




.295 

.285 

.464 

(7) 

3 




.298 

.283 

> .8 

(8) 

3 




.291 

.341 

.530 

(9) 

4 


.118 

.238 

.151 

.114 

.267 

(10) 

4 

.381 


.211 

.429 

.142 

.232 

(11) 

4 

.148 

.987 

.988 

.192 



(12) 

2 

.153 

.121 

.133 

.196 

.141 

.151 

(13) 

3 

> 1 

.220 

.240 

> 1 

.237 

.269 
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Examples (9)-(13) deal with diffusion problems with discontinuous coefficients. In par¬ 
ticular, Examples (12)-(13) are the ‘staircase’ problem (Example IV in [2], where D = 1000 
inside the staircase and D = 1 outside). 

It is evident from Example (11) that Black Box Multigrid stagnates when the break point 
u)i lies on the coarse grids. The reason for this is that the 9-coefficient stencils of its coarse 
grid operators involve strong- coupling between domains which are only weakly coupled in 
the PDE. Hence, in this case, the 5-coefficient stencils of AutoMUG are preferable (see [15] 
for a variant of Black Box Multigrid which overcomes this problem). 

It is interesting to mention that when D, rather than £>, is used for the finite volume 
discretization in Example (10), Black Box Multigrid converges rapidly while AutoMUG 
diverges. However, in light of the remarks made in [2], it is not clear whether the resulting 
scheme is meaningful. 

Acknowledgment. The author wishes to thank Moshe Israeli for suggesting the physi¬ 
cal motivation for the restriction on the grid resolution and Irad Yavneh for his valuable 
comments. 


APPENDIX A 


Proof of Theorem 1: Let w be a common eigenvector of V, F and A with the eigenvalues x, 
y and x-\-y, respectively. Then v is also an eigenvector of the iteration matrix of PAMUG(O) 
with the corresponding eigenvalue fr{x,y), where 


9ix,y) 


_ (2-a;)(2-y)(a: + y) ^ 
2(x{2-x) +y{2-y)) 

and fr{x,y) = 


xy{4: -x-y) 
2{x{2 - x) + y{2 - y)) 


To prove the theorem, it is sufficient to bound |/r| in the region Q < x,y <2. In this region, 
0 < l/r| < S' < 1- Since g is symmetric, it is natural to write it as a function of the symmetric 
variables c = x + y and d = xy. Clearly, (c, d) G (0,4) x (0,4), 


^<°’'^)= 2(2c-c^ + 2d) = 

The partial derivative of g with respect to d is 

dg . . _ (4 — c)(2c — -f 2d) — 2d(4 — c) 

^ ^ 2(2c - c2 -I- 2d)2 

_ (4 — c)c(2 — c) 

2(2c-c2 + 2d)2‘ 

Hence dg/dd > 0 if 0 < c < 2, dg/dd = 0 if c = 2 and dg/dd < 0 if 2 < c < 4. Assume that 
0 < c < 2. Then g achieves its maximum on the hyperbola xy = d for which d is maximal. 
This happens at the point x = y = c/2. But at this point we have g = c/4 and 

3 - c 



We find the maxima of h\ 


or 3 — c — cr = 0 or c = 3/(r + 1). The maximum of h in (0,2) is thus 

, / 3 \ _3 r- 
' Vr + 1/ 4{r + lY+^' 

The theorem follows from |/r| < (^) “ region 2 < c < 4. □ 


APPENDIX B 


Proof of Corollary 1: For i e {0,1}, define the injections Ox,i and Oy^i by 




Vi^rn I = * mod 2 
0 I mod 2 


and {Oy^iV^i^rn — 


vi,m rn — i mod 2 
0 m^i mod 2 


V eVn 


(Ox,i injects onto every other ^-line and Oy^i injects onto every other a;-line). Let u be a 
common eigenvector of X and Y with the corresponding eigenvalues Xy and respectively. 
Since X and Y are of property-A, it follows from [21], Sec. 7.1 that the following is a set of 
common eigenvectors of X and Y: 


w=\ y: (-i)“^®o«Ow4 

The elements of W are orthogonal to each other and have the same I 2 norm. Denote by x^ 
(resp., yyj) the eigenvalue of an element w gW with respect to X (resp., Y). Define the set 
of vectors 

V = . 

Define the symmetric orthogonal discrete Haar transform 

Hence W = HV and V = HW. Let Ma and Mp denote the iteration matrices of Auto- 
MUG(O) and PAMUG(O), respectively. Note that 0^0 = Ox^Oy^ and that OMa = OMp. 
The assumption that a postsmoothing of the form x e- POx is performed is equivalent 
to replacing the substitution Xout ■(— — Pe in (1) by Xgut G- P{Oxin — e). From these 

observations and the proof of Theorem 1, it follows that, for any w gW, 

MaW = fri^WiVv]) ^4 (f (1 — VvYOx,iOyjV. 

Consequently, span{W) is an invariant subspace of Ma- Let Ma denote the restriction of 
Ma to span{W). The representation of Ma in the basis W is of the form Ma = 2~^Hpu*, 
where p and u are the following four-dimensional vectors: 

p — (1) 1 Xy, 1 Py, (1 2:i,)(l Vv)) and u — {frixyj, ywYwGW' 
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Let p denote the spectral radius of a matrix. Then 

\\Ma\\ < = ^Ibll I|m|| < \H\ < 2max\fr{x^,y^)\. □ 


APPENDIX C 

Proof of Theorem 2: Let 

Aq = A and Di = diag{Ai), 0 < i < n — 1. 

Consider the ith call to the PAMUG procedure in the PAMUG method (1), 1 < i < n. T his 
call is designated to solve the equation Aj_ie = f. For this equation, denote the two-level 
PAMUG iteration matrix by and the multi-level PAMUG iteration matrix by Mj-i. 
For a PAMUG cycle with index e, we have (see [13]) Mn-i = A^n-i, and, for 0 < z < n — 1, 

Mi = (/-(/-M;„K-+\iii«.4j)(/-a-'DrUi)’' 

= JVi + M^,4-iRi+,4, (/ - a-‘DrUi)". 

It is easily seen by induction that all the operators Xi, R{Xi), UYiU and UR{Yi)U, for every 
i, are block diagonal with circulant Toeplitz blocks. Hence, all the operators Aj, Di and Ri, 
for every i, are diagonalizable by the 2-dimensional discrete Fourier transform; hence, so are 
also the operators Ni and, by induction, also the operators Mj. The theorem follows from 
spectral analysis. □ 
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COMPRESSIBLE EULER EQUATIONS* 
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Hampton, VA 23681 


SUMMARY 


We present a new genuinely multidimensional discretization for the compressible 
Euler equations. It is the only high-resolution scheme known to us where Gauss- 
Seidel relaxation is stable when applied as a smoother directly to the resulting high- 
resolution scheme. This allows us to construct a very simple and highly efficient 
multigrid steady-state solver. The scheme is formulated on triangular (possibly un¬ 
structured) meshes. 


INTRODUCTION 


One of the most challenging problems in numerical analysis was the construction of a 
numerical scheme for gas dynamics in one dimension. Such a scheme had to combine 
high-order accuracy in the regions of the smooth flow with the ability to represent 
discontinuities by thin oscillation-free layers. These two properties are not both at¬ 
tainable within the class of linear schemes (Godunov’s theorem). Therefore, the suc¬ 
cessful scheme should be non-linear. Schemes of this type were named high-resolution 
schemes. The discrete schemes for the equations of gas dynamics in multidimensions 
are usually obtained using the dimensional-splitting approach, i.e. applying a one¬ 
dimensional scheme in each coordinate direction. The main problem, however, is that 
the steady-state solvers based on such schemes suffer from poor computational effi¬ 
ciency. It was observed by Spekreijse [1] that such a simple and eflficient smoother as 
pointwise Gauss Seidel relaxation is unstable in conjunction with such schemes even 
in the simple case of linear advection equation. The multigrid solvers, therefore, have 
to resort to multi-stage Runge-Kutta relaxation or to defect-correction techniques, 
which are not the really efficient ways to utilize the multigrid approach. 

‘This research was supported by the National Aeronautics and Space Administration under 
NASA Contract No. NAS 1-19480. 
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The reason for the fact that the Gauss-Seidel relaxation is unstable when applied 
in conjunction with the dimensionally-split high resolution schemes can be traced 
down to the particular way the nonlinearity is incorporated within these schemes. 
This motivated the search for a high resolution (at least at the steady-state) scheme, 
with the nonlinear high-resolution correction introduced in such a way that it does 
not lead to the instability of the Gauss-Seidel relaxation. This search resulted in 
the genuinely multidimensional advection scheme of the control volume type (see 
[2],[3]). The so-called fluctuation-splitting type schemes (for unstructured triangular 
meshes) were also introduced (see [4],[5]). A strong relationship between the two 
types was established in [6]. However, it was not clear for a long time how to extend 
these ideas to the systems of equations. One of the major directions was the so- 
called wave modeling (see [7],[8]). This approach concentrated on finding a way to 
represent (locally) the physics of two-dimensional flow of a compressible fluid b}^ a 
finite number of simple waves, each one having an associated advection equation. 
However, numerical schemes created this way suffered from a lack of robustness. The 
approach introduced in [9] is concerned not with applying an advection scheme to 
discretize a system of equations in two dimensions, but rather with applying to the 
systems of equations the same strategy that was used when constructing a scalar 
advection scheme. The resulting genuinely two-dimensional scheme is formulated 
on triangular (possibly unstructured) meshes. The unique advantage of this high- 
resolution discretization is that the Collective Gauss-Seidel relaxation can be applied 
directly to the high resolution discrete equations. This results in a very simple and 
efficient multigrid steady-state solver. 

In this paper first we introduce some further enhancements to the scheme pre¬ 
sented in [9]. Numerical experiments will be presented. Some possible extensions of 
the truly multidimensional approach will be discussed. 

GENUINELY TWO-DIMENSIONAL ADVECTION SCHEME 

Consider a linear two-dimensional advection equation 

ut +au^-\-huy = Q. ( 1 ) 

Consider the triangulation of the domain as illustrated on Fig.l. Denote by R the 
fluctuatioii (i.e., the residual of equation (1) on triangle T multiplied by the area of 
this triangle); 

R = R^ + R\ (2) 

where 

R^ = -j[a(iio - U3)] 

R^ = -§lb(n3-U4)]. 

The following fluctuation distribution formulae 

= /RuS + ^R^ 

= hht^ + l[R^ + Ry] 

+^Ry 

im 


(3) 



reproduce the central difference scheme, which is second-order accurate (in space) but 
is known to be unstable. 

We shall introduce here the positiviiii property. 

Definition 1. A scheme is said to be of the positive type if any solution value on 
the new time level obtained by this scheme can be written as a positive combination 
of the values from the previous time level. 

Solutions obtained by using positive schemes satisfy a certain maximum principle 
and, therefore, do not exhibit oscillatory behavior in the presence of discontinuities. 
It is obvious that the central scheme (3) is not of the positive type. 

Modifying (3) by adding the appropriate artificial viscosity terms 
= hi^xiQ + + sign(a))] 

= h"-u'i + liR^l - sign(a)) -f R^{1 + sign(6))] (4) 

= h-'id^ 4- f - sign(&))] 

we recover the dimensional upwind scheme which is positive, but only first order 
accurate. 

Definition 2. The fluctuation-splitting scheme is called linearity preserving if 
whenever the fluctuation on the triangle T vanishes then the scheme leads to a zero 
update in each of the three vertices of the triangle. 

The upwind scheme (4) does not satisfy this property since the fact that R = 0 
does not necessarily imply that R^' = R^ = 0. Therefore, a non-zero update of the 
nodal values may be introduced. 

Introduce the following quantities 


R^' =R^ + RyA!{Q) 

Ry* =Ry R R^^ 

(5) 

where 

R^ 

Q = 

Ry 

(6) 

and is a. Lipschitz continuous limiter function such that 


0 < ^(g) <1, 0 < < 1 

(7) 

and 


g-(i) = i. 

(8) 


Substituting for RT,R}' into (4) satisfies the linearity preserving property. 

This can be demonstrated in the following way: assume that R = 0. This means that 
R^ = —R^ or Q = —Rl^/Ry = 1. It can be seen that no update will be introduced to 
any of the unknowns at the nodes of triangle T, provided the limiter-function sa.ti.sfies 
the equality (8). This scheme is also second order accurate at the steady-state, since 
the grid considered here is structured (see [6]). 
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Using the following identity 


V 

we can rewrite (5) in the following form 

i?^* = R^il - 

Ry* = Ry{i - $(g)). 


(9) 


( 10 ) 


It is easy to see that the scheme defined by (4) and (5) (or 10) is of positive type, 
provided the inequality (7) holds. 

It is also obvious from (9) that such scheme is conservative because 

-\-Ry* =R^ + Ry = R 


(for more details see [6]). 

MULTIDIMENSIONAb EULER SCHEME 


The Euler equations of gas dynamics in two dimensions can be written 

Ut + F{u)x + G{u)y = o, 

where 



( ^ ] 


( pu \ 


/ pv \ 

u = 

pu 

pv 
\ e 

; F{u) = 

pu^ + p 
puv 

V puH j 

i G(u) = 

puv 

-fp 

V pvH / 

where the enthalpy H 

is defined by 





Tj _ P _ ^ 

p 7 -1 2 


the speed of sound 


and the pressure 





P = (7- l)(e-/>——) 


( 11 ) 


( 12 ) 


(13) 


(14) 


(15) 


The quasilinear non-conservative formulation of the Euler system in auxiliary vari¬ 
ables {s,u,v,p) can be introduced in two dimensions as well 


St + USj, -f USy = 0 

pUt + pUUx -f pVUy -I- Pj. = 0 

pVt + pUVx -f pVVy + Py = 0 

Pt + up^ + VPy -I- pc^(t<3; -1- Uj,) = 0 


(16) 
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where ds = dp ~ 

Remark 3. Note that the entropy (s) evolution is subject to the two-dimensional 
advection equation, which is locally decoupled from the rest of the system. 

The fluctuation of the system (11) defined over the triangle T is 

R = J J ut ‘= -J JiFx + Gy) dx dy = -St [f, + Gy] (17) 

where Fx, Gy are some averaged values of the flux derivatives over the triangle T. 

Our construction of the truly two-dimensional Euler scheme utilizes the two- 
dimensional conservative linearization procedure [10]. We assume that the quantity 
which varies linearly over an element is the “parameter vector” 

m= y/p{l,u,v,H)‘^ (18) 

and its averaged value on the triangle T (as illustrated on Fig.l) is given by the 
following 

mo -b m 3 -t- 1714 
m- ^ 

Roe-averaged quantities can be introduced 

u = m 2 /mi 

v = mzlrhi ( 20 ) 

H = 7724/mi 

and 

2“ = (7-1)17/+ «")]. (21) 

Fluctuations of the Euler system in the auxiliary variables can be presented as 



r = r® -f (22) 

where 


= -StA - {sx, pUx, pVx.Px)'^ 
ry = -StB ■ {sj,puy,'^y,p;)'^ 


/ u 

0 

0 

0 ^ 


( ^ 

0 

0 

0 \ 

0 

ii 

0 

1 

, B = 

0 

V 

0 

0 

0 

0 

u 

0 

0 

0 

V 

1 

V 0 

52 

0 

u ) 


10 

0 

& 

V / 


and St = h?I2 \s the area of the triangle T, and 

= 2mi{m4)x (23) 

pux = mi(m2)r - d22(mi)3; (24) 

-p^x = mi{mz)x - ni 2 ,{mi)x (25)' 

T — 1 

Px = -[(’h,,(7ni),^. -b mi(?T 24 )^) -b (m2(7n2).r + m3(?723)^.)]. (26) 

1 
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The corresponding terms involving derivatives in the y direction can be written in 
the analogous manner. 

Introducing the matrix 


/ 1 

0 

0 

1/c^ \ 

' u 

1 

0 

u/c^ 

V 

0 

1 

vjcy 

(, («2 -1- u^)/2 

u 

V 

1/(7 - 1 ) + + u^)/(2c2) ) 


(27) 


we can define 

= CaV^ 

By = Cary. 

It can be easily verified that 

Ry = -S^Gy, 


(28) 

(29) 


where Fx,Gy are the same averaged flux derivative values as defined in [10]. It is 
also obvious that the entire fluctuation 


R = R^ + Ry = Ca{r^ + ry) = C„r. 


(30) 


Consider triangle T as illustrated in Fig.l. The fluctuation is distributed according 
to the following formulae: 

=5< C,[r"(/- sign(i))] 

= Su^ +|C'a[r®(/+ sign(A)) + r^(/- sign(jB))] (31) 

+^Ca[ry{I + sign{B))] 

we obtain the scheme that is similar to the standard Roe dimensionally split scheme. 
The only difference is in the linearization procedure. 

We can construct now a (linearity preserving) second order accurate scheme. First, 
we shall introduce vectors with their elements defined by 

rf = rf + ^(9i)rf 
„?/* _ 

't ^ qi ' I 

for i = 1,2,3,4, where 



and ^ is a (non-compressive) limiter. 

Substituting for r^,ry in (31) we obtain a genuinely two-dimensional 

scheme, which is also linearity preserving (second order accurate in this case) and 
conservative. 

Some attributes and properties of the genuinely multidimensional schemes will be 
discussed later in [9]. In order to obtain an efficient implementation of the scheme 
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described above, it is important to write down the explicit expressions for the matrices 
sign(j4),sign(j5). Denote 

= sign (A) 

My = sign(5). 

For matrix Mj. the distinction should be made between two cases 



M — 1 ^ ’ 

if l?i| < c 
if |u| > c, 

(34) 

and similarly 

M = y ’ 

^ \ 

if |u| < c 
if lh| > c, 

(35) 

where 





= sign(ri)7, 

(36) 


M^^‘^ = sign(u)/ 

(37) 


and 1 is the 4x4 unit}' matrix. These matrices for the subsonic case appear to be 
surprisingly simple as well 
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(38) 


(39) 


Their structure indicates that there are some intriguing similarities between the stan¬ 
dard schemes used for incompressible flow computations and the multidimensional 
upwind scheme presented above (see [9]). 

Remark 4. The scheme formiilated here can be extended to the case of general 
xmstructured grids in a straightforward way. Having a general triangular element, one 
has to introduce a new (possibly non-orthogonal) coordinate system whose axes align 
with tioo chosen faces of this element (Fig.2). The Euler system has to be rewritten in 
these new coordinates. Then one can follow directly the procedure of constructing the 
fluchtation distribution formulae presented in this section (see [9] for more details). 


NUMERICAL EXPERIMENTS 


The purpose of the numerical experiments reported in this section is to verify the 
robustness of the constructed scheme and the quality of the numerical solutions ob¬ 
tained by its means. Some experiments illustrating the performance of the multigrid 
algorithm using this scheme are presented as well. 
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Supersonic flow in a channel with a bump 


The test case considered here is a supersonic (Mach=2.9) flow in a channel with a 
circular bump. The bump is located at the lower wall of the channel at 1 < x < 2, 
and its surface is a circular arch of tt/S and radius 1. Note that the actual shape of 
the domain is a rectangle. The influence of the bump on the flow is imposed through 
the boundary conditions: the velocity component normal to the surface of the bump 
at a certain location is being reflected. 

The first experiment uses a grid of size 200 x 40 points. The density contour plots 
of the steady-state solution are presented on Fig.3(a). The scheme used is the one 
given by (31), (32) with the mmmod limiter. 

The second experiment presented in Fig.3(b) corresponds to the same settings, 
except that the grid is twice flner (400 x 80 points). As is expected, the grid refinement 
results in a better resolution of the flow features. 


Transonic flow over a circular bump 


The test case considered here is a transonic flow (free-stream Mach= .9) over a flat 
wall with a bump (Fig.4). The surface of the bump is a circular arch of 7r/3 and radius 
1 and its location is between 3.5 ^ x < 4.5. Again, in order to keep the experiments 
simple at this stage of work, the bump is treated the same way as in the previous 
experiments. The grid is 200 x 200 points. The shock of the “fish-tail” shape can be 
clearly observed in Fig.4. 

Low Mach number flow over a circular bump 

Here we present a numerical experiment concerning a low Mach number (=.l) flow 
over a flat wall with a circular (arch of 7r/3 and radius 2) bump. Here as well as in the 
previous case the presence of the bump is imitated through the appropriate boundary 
conditions. The grid is 200 x 200 points. The density contours of the stead 3 '’-state 
solution are presented in Fig.5. 


Multigrid algorithm 

To illustrate the performance of the multigrid algorithm we consider here the well 
known test case of a shock reflecting from a flat wall. The multigrid algorithm involves 
five grids (levels): the finest consists of 129 x 33 points, the coarsest is 9 x 3 points. 

The multigrid algorithm is based on the same two-dimensional scheme used with 
the lexicographic Gauss-Seidel relaxation. The restriction and prolongation proce¬ 
dures are the standard Full Weighting of the residuals and bilinear correction inter¬ 
polation. The numerical solution to this problem obtained by the 2FMG — W(2, 1) 
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algorithm is presented on Fig.6(a). Fig.6(b) presents the numerical solution obtained 
using the same algorithm but performing three more cycles (five total) on the finest 
level. 

Note that in this case the flow is aligned with the a;-direction in a significant 
part of the domain. In this case the artificial viscosity in the cross-stream direction 
in the entropy and u-momentum equations vanishes. Therefore, no smoothing can 
be obtained in the y-direction in some components. A multigrid algorithm utilizing 
the time-stepping type relaxation can deal with such a situation only using the semi¬ 
coarsening technique. Our algorithm employs the Gauss-Seidel relaxation. Therefore, 
it offers a much simpler and more efficient treatment of this problem: relaxation with 
lexicographic ordering in the stream direction. 

The rate of convergence observed in this test case as well as in other simple 
experiments concerning a variety of flow regimes is very close to .75. 

DISCUSSION AND FUTURE WORK 


Summary of the current work 

A new two-dimensional high-resolution (at the steady-state) scheme for the compress¬ 
ible Euler equations was presented. It is triangle-based and can be formulated with 
the same degree of simplicity both on structured and unstructured grids. The main 
advantage of this scheme is that Gauss-Seidel relaxation can be applied directly to 
the resulting discrete equations. This allows construction of a simple and efficient 
multigrid steady-state solver. 

A remarkable property of the constructed scheme is also its very compact stencil: 
it involves only the immediate neighbors of the point of interest. 

A variety of flow regimes (supersonic, transonic and low Mach number flow) were 
considered in the numerical experiments to verify the quality of the solutions ob¬ 
tained by means of the new scheme and to demonstrate the efficiency of the multigrid 
algorithm. 

Generalization of this scheme to three dimensional tetrahedral meshes is straight¬ 
forward (see [9]). 


Further improvement of the multigrid efficiency 

The main obstacle preventing the further improvement of the multigrid efficiency 
is the following fact: for the hyperbolic problems the coarse grid correction is not 
sufficient for certain error components. 

This difficulty was already addressed in the literature and some techniques to 
improve the multigrid efficiency were developed in [11]. Therefore, one possibility is 
to adapt these techniques for our case - compressible Euler equations. 
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Figure 4: Transonic flow over a wall with a circular bump (free stream Mach= .9). 
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ALGEBRAIC MULTIGRID BY SMOOTHED AGGREGATION 
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Summary. An algebraic multigrid algorithm is developed based on prolongations by smoothed ag¬ 
gregation. Coarse levels are generated automatically. Guidelines for the selection of method components 
are presented based on energy considerations. Efficiency of the resulting algorithm is demonstrated by 
computational results. 
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1. Introduction. Multigrid methods are very efficient iterative solvers for systems 
of algebraic equations arising from finite element and finite difference discretizations 
of elliptic boundary value problems. The main principle of multigrid methods is to 
complement the local exchange of information in point-wise iterative methods by a global 
one utilizing several related systems, called coarse levels, with a smaller number of 
variables. The coarse levels are often obtained as a hierarchy of discretizations with 
different characteristic meshsizes, but this requires that the discretization is controlled 
by the iterative method. To solve linear systems produced by existing finite element 
software, one needs to create an artificial hierarchy of coarse problems. The principal 
issue is then to obtain computational complexity and approximation properties similar 
to those for nested meshes, using only information in the matrix of the system and as 
little extra information as possible. 

Such algebraic multigrid method that uses the system matrix only was developed 
by Ruge, et al. [10, 4, 11]. The prolongations were based on the matrix of the system 
by partial solution from given values at selected coarse points [1]. The coarse grid 
points were selected so that each point would be interpolated to via so-called strong 
connections. 

Our approach is based on smoothed aggregation introduced recently by Vanek [14, 
13]. First the set of nodes is decomposed into small mutually disjoint subsets. A tent¬ 
ative piecewise constant interpolation (in the discrete sense) is then defined on those 
subsets as piecewise constant for second order problems, and piecewise linear for fourth 
order problems. The prolongation operator is then obtained by smoothing the output of 
the tentative prolongation and coarse level operators are defined variationally. Multigrid 
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method based on such prolongations converges very fast for a wide range of problems 
including those with strongly anisotropic and discontinuous coefficients and, in addition, 
it has a remarkably low computational complexity since the typical coarsening ratio is 
about three in each dimensiqn. 

Almost optimal theoretical bounds for our method were given by the authors in [15] 
for second order problems and under natural assumptions on the coarse level hierarchy 
that tend to be satisfied by our coarsening algorithm, namely that the coarsening is by 
about the factor of three, and that the aggregates of the nodes are based on aggregated 
elements that form a reasonable mesh of macroelements. A bound on the energy of 
the coarse level basis functions was proved and used to verify the assumptions of the 
multilevel regularity-free approach of Bramble, Pasciak, Wang, and Xu [3]. The theory 
can be extended to fourth order problems once similar energy bounds are available for 
that case. 

The part of this paper dealing with second order problems is based on [15]. The 
algorithm for fourth order problems is new. For more details and theory for the second 
order case, see [15]. 

For other multigrid approaches to the biharmonic equation, see [5, 9, 16, 8 ]. For 
a multigrid theory for the biharmonic equation with non-nested finite element spaces, 
see [ 2 ]. 

1.1, Basic Multigrid Algorithm. For reference, we state the basic multigrid 
algorithm for the solution of the system of linear algebraic equations Ax = b. First, 
a preprocessing stage creates full rank prolongation matrices Pi of size nt x n/+i, I = 
1,...,L — 1 by an automatic coarsening process described below. The coarse level 
matrices are defined by 

Ai = A, AiPi,l = 1,..., L — 1. 

The iterations then proceed as follows. 

Algorithm 1 (Basic multigrid). To solve the system Aix‘ = b‘, do: 
Pre-smoothing: do ui times x^<r-’S\x‘,b‘) 

Coarse grid correction: 

• let {9 — Aix^) 

• If I + 1 = L, solve by a direct method, otherwise apply 

7 iterations of this algorithm on level I + 1, starting with initial guess 

= 0 

• correct the solution on level I by x^<—x^ + Pix^'^^ 

Post-smoothing: do 1/2 times x‘<—S^{x‘,b‘). 

We use = 1/2 = 7 = 1 with the pre-smoothing iteration consisting of one forward 
iteration of the Gauss-Seidel followed by one iteration of backward SOR. The post¬ 
smoothing iteration consists of one forward SOR iteration followed by an iteration of 
backward Gauss-Seidel. The over-relaxation parameter used is 1.85 in both pre- and 
post-smoothing. 
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Each level is associated with basis functions The basis functions on the 

finest level are given as finite element shape functions, while the coarse level basis 
functions are determined from the prolongations by 






PI 




’•Pn, J 


fc = 1. 


2. Algebraic Muitigrid for Second Order Problems. Consider discretization 
by standard conforming linear finite elements of a second order elliptic variational prob¬ 
lem 


u £ V : a{u, v) = f{v) Vu 6 V 


( 2 . 1 ) 


where V = denotes the Sobolev space of functions vanishing on TdCOU, 

^(Ed) > cfi{dO,), n a domain in IR^. The bilinear form 


a{u, w) = / C'ijdiudjV 

Ja . . 


is assumed to be symmetric, V-elliptic, and bounded, 
^i||“llHi(n) ^ a{u,u) < C2||u||^i(jj), 


Vue y. 


( 2 . 2 ) 


(2.3) 


Moreover we assume that the finite element basis forms a decomposition of unity 

ni 

1 (2.4) 


t=:l 


away from essential boundary conditions. 

2.1. Construction of Prolongations for second order elliptic problems. 
The prolongation operators are chosen to achieve low energy of coarse basis functions, 
leading to good theoretical estimates of the convergence of the iterations, as well as by 
sparsity considerations to achieve low computational complexity of the iterations. We 
are looking for prolongations that satisfy the following properties. First we specify the 
desired properties of the support of the coarse shape functions (or, equivalently, the 
allowed nonzeros of the prolongation matrices), and then the numerical values of the 
nonzero entries. 

(AMGl) Coarse supports should follow strong couplings. We require that every 
two nodes in the support of a coarse basis function can be connected by a path 
of strong couplings. Two nodes i and j on level / are strongly coupled if |aE| 

is relatively large compared with ^|aEaE|. Essentially, we want to assure that 
the algorithm will provide the semi-coarsening in the case of solving of the an¬ 
isotropic problem ( [6], [12] ). Algebraically, the anisotropy is reflected in the 
coefficients of the stiffness matrix in the sense that the neighboring nodes are 
strongly coupled in the direction of anisotropy. 
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(AMG2) Bounded intersection. Support of each basis function intersects a bounded 
number of supports of other basis functions on the same level only. The nrunber 
of intersections does not depend on the level. This property guarantees sparsity 
of the resulting coarse-level matrices. 

(AMG3) Decomposition of unity. Every coarse space Vi should represent the con¬ 
stant function exactly, aside from an essential boundary condition. This re¬ 
quirement is motivated by the need to bound locally the error of a coarse grid 
approximation of a fine grid function in terms of the energy {v})^ Aiv} 

and by the fact that the constant function has zero energy because of (2.2). 
Because of (2.4), this is equivalent to the requirement that the columns of each 
prolongation matrix form a decomposition of unity 

»!+! 

= ' = 1 . 

i=i 

for all rows i that do not correspond to degrees of freedom adjacent to an 
essential boundary condition. For generalizations, see Sections 3.1 and 3.3. 
(AMG4) Small energy of coarse basis functions. We require that the energy of 
the coarse space basis functions be almost minimal in the sense that 

:_r 

u€H^{supp(fi\) IPllL2{n) 

Note that in the case of uniformly V-elliptic problems the requirement above, 
together with bounded intersections of supports of basis functions (AMG2), 
assures the standard inverse inequality on each coarse space. 

(AMG5) Uniform P equivalence. Discrete I 2 norms on all spaces V/ should be uni¬ 
formly equivalent up to diagonal scaling. The scaling may depend on the meas¬ 
ure of the support of basis function and type of degree of freedom. For the 
algorithm described in this section, such uniform equivalence has been proved 
in [15]. 

We now construct prolongations Pi based on the matrix A;. First we create a 
tentative piecewise constant prolongator satisfying all of the above properties except 
for the energy bound in (AMG4). This prolongator will then be smoothed to satisfy 
(AMG4), while preserving the other properties. 

We start by specifying a disjoint decomposition of the set of nodes on level 1. Every 
component of the decomposition on level I ( so-called aggregate ) gives rise to one degree 
of freedom on level / -|- 1. 

Motivated by the requirement (AMGl) above, for a given e we define the strongly- 
coupled neighborhood of node i as 

^ii^) = {j ■ M > S\/aiiajj} U {z} (2.5) 
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Algorithm 2 (Aggregation). Let the matrix At of order ni and e e [0,1) be 

given. Generate a disjoint covering of the set {1,... ,n/} as follows. 

Initialization Set i? = {1,_, n/} and j = 0. 

Step 1 Select disjoint strongly coupled neighborhoods as the initial attempted cover¬ 
ing: If there exists a strongly coupled neighborhood Nl{e) C R, set j<r-j + 1, 
Cj<—N-{e), Ri—R \ Cj. Repeat until R does not contain any strongly coupled 
neighborhood. 

Step 2 Add each remaining i ^ R to one of the sets already selected to which it is 
strongly connected, if possible: 

Copy Ci = C[, k = l,...,j 

If there exists i G R and k such that N-{e) f) Cl ^ $ then set Cli-Cl U {i}. 

Repeat until no such i exists. 

Step 3 Make the remaining i E R into aggregates that consist of subsets of strongly 
coupled neighborhoods: If there exists i £ R, set + 1 and Cj = Rf) N‘{e). 
Repeat until i? = 0. 

Define the tentative prolongation P/ by the aggregates C-: 


(Pih = 


1 if i G Cj 
0 otherwise 


(2.6) 


The piecewise constant prolongation Pi will now be improved by a smoothing to 
get the final prolongation matrix P/. We choose a simple Jacobi smoother, giving the 
prolongation matrix 

Pi = {I-uD-^Af)Pi 

where Af ^ («J) is the filtered matrix given by 

if * # 3 , = «u 

and D denotes the diagonal of Af. 

When applying Algorithm 2 to uniformly elliptic problems, one usually obtains the 
coarsening by about a factor of 3 in each dimension and the resulting coarse level matrix 
Af+i tends to follow the nonzero pattern of the 9-point stencil. The filtration (2.8) has 
little or no effect in this case. 

In the case of anisotropic problems, however, the application of the smoother with 
the unfiltered matrix would make the supports of basis functions overlap extensively in 
the direction of weak connections. Here the filtration prevents the undesired overlaps 
of the coarse space basis functions. By construction, Af typically makes the nonzero 
pattern of follow the 9-point stencil as in the uniformly V-elliptic case. It also 
assures that a constant remains the local kernel of Af at every point where constant 


F ^ f if i e Nl{e) 1 

^ 0 otherwise J 


ni 

- ( 2 - 8 ) 


(2.7) 
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Fig. 2.1. The basis functions given by aggregation and the corresponding smoothed basis 
for ID Laplacian, using the smoother I — 2/SD~^A. 


is the local kernel of A/. Consequently, for problems without zero-order term the final 
prolongator Pi satisfies the decomposition of unity away from the essential boundary 
conditions. 

Fig. 2.1 shows the ID coarse basis functions resulting from prolongation by aggrega¬ 
tion and the smoothed aggregation. Note that for the ID Laplace operator and the choice 
of a; = 2/3 in (2.7), the smoothed coarse space basis is exactly the one of Pl-finite 
elements. Fig. 2.2 shows the typical aggregates obtained on an unstructured grid. The 
corresponding supports are formed by adding one belt of elements to the aggregates. 
The smoothing adds at most one more belt of adjacent elements. 

We choose 


£ = 0 . 08 (i)'-‘, u,= |. 

The theory for the above method can be found in [15]. 

3. Generalizations. 

3.1. High order elements and unsealed problems. The decomposition of 
unity (2.4) may be violated in practice. In such a case, in order to construct coarse 
spaces representing the constant function exactly, we need the representation of unity 
with respect to finite element basis of finest space Vi as user input data. More specific¬ 
ally, we need the vector a G IR”^ satisfying 




away from essential boundary conditions. 
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Fig. 2.2. Typical 2D aggregates. 


The definition (2.6) of the auxiliary prolongators remains in place for all levels but 
level 1; we define Pi as 



ai if i e C] 

0 otherwise 


(3.1) 


Thus, the unit constant function is represented by the vector a = (Q:i)”i,i on the finest 
level, while on levels 2 to T, the unit constant function is represented by vectors of 
all ones. The process can be easily generalized to the nonscalar case using the block 
approach described in Section 3.2. It was applied to the problem from Example No. 1 
of Section 5 modified by scaling the basis functions randomly in the interval [0.01,1]. 
The results are summed up in Example No. 5. 

3.2. Vector problems. In the case of nonscalar problems, the coarsening al¬ 
gorithm as described in Section 2 is likely to produce aggregates of physically incom¬ 
patible degrees of freedom causing deterioration of convergence. This phenomenon can, 
however, be overcome by using so-called block approach, which consists in replacing the 
scalar operations on the level of degree of freedom by their block counterparts on the 
level of node. Let denote the number of degrees of freedom per node ( assumed to be 
constant ) and df{i) be the list of degrees of freedom associated with the node i. The 
communication between the neighboring nodes k, I can now be expressed in the form of 
a matrix selection Aki of order 


Ak, = A{df{k),df{l)). (3.2) 

The definition of strongly coupled neighborhood of node i (2.5) is now replaced by 

= IKII > « U {i}. (3.3) 
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where ||.|| is a matrix norm. Further, in the definition of auxiliary prolongations (2.6), 
we replace the numbers 1 and 0 by identity and zero matrices of order n<i, respectively. 
The efl&ciency of this generalization is demonstrated by Experiments No. 1 and No. 5 
in Section 5. 

3.3. Absolute term. Consider now (2.1) modified by adding a positive absolute 
term 

a(u, u) = / y] aijdiudjV + quv, q > 0. 

In this case, the prolongation smoothers lose its constant-preserving property because 
the constant is no longer locally in the kernel of Ai- Fortunately, the presence of the 
absolute term improves the condition number of A;, thus compensating for the loss of 
the preservation of a constant. 

For large q, the absolute term also has the effect of boosting the diagonal dominance 
in certain (block) columns. The nodes corresponding to these columns are then treated 
by Algorithm 2 as isolated nodes, and the coarsening process may stall. Note that the 
same phenomenon may also result from certain treatments of the essential boundary 
conditions. This difficulty can easily be defeated by a simple modification. Removing 
these nodes from the set R in Algorithm 2 prevents the stalling. At the same time, it 
does not harm the convergence of the overall method, because the smoothers are very 
efficient at approximating values in numerically isolated nodes. 

4. Method for High order problems. For the elliptic problems of order 2K^ K > 
1 requirements on prolongators have to be slightly stronger. Instead of decomposition 
of unity (AMG3) we now need the more general requirement. 

(AMG3’) Every coarse space Vi must represent polynomials of degrees up to A' — 1 
exactly, away from the essential boundary conditions. As in the case of second order 
problems, this requirement is motivated by the need to control the coarse-grid approx¬ 
imation of of v} by energy {v}YAiui and by the fact that norm and seminorm are 

equivalent on the factor space modulo polynomials of degree of up to K — 1. 

Second, the small energy of coarse basis functions (AMG4) must be replaced by 
its straightforward generalization. 

(AMG4’) We require that the energy of coarse space basis functions be almost 
minimal in the sense that 

II M|2 — ^ mi jl j|2 • 

IIV^illL2(n) IPllL2(n) 

Unfortunately, the construction of prolongators resulting in the coarse spaces satis¬ 
fying (AMG3’) for A > 1 is not possible without additional user input. In order to be 
able to approximate the polynomials with degrees of up to K — 1 by coarse space func¬ 
tions exactly, we need their representation with respect to the finest level basis 
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Finally, assumption (AMG5) may be satisfied with different scaling for each type of 
degree of freedom. 

For the elliptic problem of order 2K on the domain (1 C IR*^, we need vectors 
g 1R”>, i = 1... ^ K — I, j = l,...d satisfying 

Til n\ 

= x) (4.1) 

fc=i fc=i 

away from the essential botmdary conditions. For example, to solve the biharmonic 
equation in 2D, we need and the representations with respect to the 

fine-level basis of the planes z = z = x, z = y, respectively. 

The coarsening technique we are using is a natural generalization of the concept 
of smoothed aggregation described in Section 2.1. The aggregation step (2.6) can be 
viewed as a restriction of the unit vector to aggregates C/, which gives rise to one degree 
of freedom on the level /-t-1 for each C\. Here, tentative prolongators will be generated by 
restricting all the vectors to the aggregates C/. Each aggregate will be represented 

by a set of degrees of freedom, where every degree of freedom corresponds to one of 
the vectors (see Fig. 4.1). The shape of the basis functions derived from the 

nonconstant polynomials depends on the position of the aggregate. More specifically, 
being far away from the origin, basis functions derived from polynomials of higher degree 
contain a large low degree polynomial component which results in the violation of the 
uniform equivalence of discrete and continuous T 2 —norms. This undesirable effect is 
suppressed by a local I 2 Gram-Schmidt orthogonalization process performed on each 
aggregate C} (see Fig. 4.2). Again, the resulting prolongator will be smoothed by the 
Jacobi smoother (see Fig. 4.3). 



Fig. 4.1. The coarse-space basis given by the restriction of p° and p^^ onto aggregates of nodes. 

The following is a generalization of the algorithm of Section 2.1 to the case of 
problems of order 2K, K > 1. 

Algorithm 3 (Coarsening of high-order problems). We assume the num¬ 
ber of degrees of freedom per node on the finest level to be constant. Let ni be the number 
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Fig. 4.2. The coarse-space basis after I 2 Gram-Schmidt modification. 


Fig. 4.3. The final smoothed basis. 




of nodes on the finest level, and df^{i) denotes the list of degrees of freedom associated 
with the node i. We set = p^^^\ i = 1,..., K — 1, j = 1,... ,d (see 

(4-1))- 

Step 1 - Decomposition. Generate the disjoint covering of the set of nodes 

{1, ..., n;} using the Algorithm 2, where the strongly coupled neighborhood of 
i is defined by (3.3) and Aij is the selection A\df\i),dfi{j)). 

Step 2 - Restriction. For each aggregate Cl define the index set D\ of all degrees of 
freedom associated with nodes in C\, i.e. 

= U ifii)- 

)€c; 


For every D\ generate auxiliary sparse vectors ..., by 






l,i,2 




A,i,Z 




,v 


l^iyTXp 




where 2K is the order of equation, d is the number of space variables 
and np — (K — l)d + 1 is the number of the user supplied polynomials. v\i 
denotes the restriction of the vector to the index set in the sense that (u|/)j' = vi 
if i G I, zero otherwise. 

Step 3 - Gram-Schmidt modification. For each aggregate Cl update the set of as¬ 
sociated sparse vectors generated in Step 2 by I 2 Gram-Schmidt orthogonaliza- 
tion process in the ordering ..., ( i.e., vectors derived from 

low-degree polynomials are processed first ). Note that the representation of the 
unity remains unchanged by the process. 

Step 4 - Building of auxiliary prolongators. Generate the auxiliary prolongator Pi 
whose np{i — 1) + j-th column consists of the vector and create the corres¬ 
ponding coarse-level list of degrees of freedom associated with node i 


df'^^{i) = {np{i - 1)+ 1, np(i - 1) + 2,... ,npi}, i = 1,... ,ni+i. 
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step 5 - Representation of polynomials on the coarse-level. Generate vectors 
p/+i,(o)^p/+i,(ii)^ ^ ^pf+i,(K-i,d) satisfying 

pi,(.o) _ pjpi+h(o)^ pi,(11) ^ ^ pi,{K-i,d) ^ Ppi+i,(K-i,d)^ 

As {C,-} is a disjoint covering, the columns corresponding to different aggregates 
are l 2 -orthogonal and consequently, the global Gram matrix given by columns 
of Pi is a block matrix. Therefore, ... ^pi+^’l^-hd) jg com¬ 

puted by solving the local problems with Gram matrices generated by the columns 
of prolongator Pi associated with Cl 

G! = {(»'", 

step 6 - Final smoothing. Improve the prolongator Pi by smoothing step (2.7), (2.8), 
where scalar entries aij are replaced by blocks Aij = A){df\i),df\j)). 

Remark 4.1. Note that the final smoothed coarse basis functions resemble the 
standard shape functions for the Hermitean element with one degree of freedom for the 
value at the node and one degree of freedom for each derivative. This is true regardless 
of the choice of basis functions in the original problem (finest level), and makes an 
algebraic coarsening possible. 

For the results of application of Algorithm 3, see Experiments 6 and 7 in Section 5. 

Remark 4.2. Efficient solution in the case of nonscalar problems of second order 
may also need the use of the coarsenig technique described in this section. For example, 
in the case of 3D elasticity, the energy norm is not equivalent to (i7^)^-seminorm on the 
factorspace modulo constant in each field in the local sense, and consequently, the ap¬ 
proximation property of the coarse space depends on the global constant of V-ellipticity, 
which can be very small if, for example, displacements are prescribed only on a rather 
small part of the boimdary. 

In order to eliminate the dependence of the convergence on boundary conditions, 
we need the prolongator to support the local kernels of the form, which will typically 
assure the desired local equivalence on the factorspace modulo kernel (i.e., local Korn’s 
inequality on macroelements ). 

Thus, it is reasonable to build prolongators supporting the entire local kernels of the 
bilinear form instead of just a constant in each field. This can be achieved by supplying 
the representation of the basis vectors of the kernel in place of the vectors .... 

A similar technique that builds the coarse space from local generators of the nullspace 
is used in the so-called Balancing Domain Decomposition [7]. 

5. Numerical Experiments . The experiments in this section demonstrate the 
favorable behavior of the method. The code is available through anonymous ftp to 
tiger.denver.colorado.edu , directory /pub/faculty/pvanek. The experiments were 
performed on an IBM RS-6000/360 with 128 MBytes of memory. 
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experiment No. 

rate of convergence 

algebraic complexity 

CPU time 

real time 

1 

0.08 

1.23 

5s 

5s 

2 

0.10 

1.56 

768s 

7892s 

3 

0.21 

1.14 

134s 

233s 

4 a/b/c 

0.11/0.10/0.10 

1.65/1.65/1.65 

85/85/85S 

95/96/91S 

5 

0.09 

1.24 

13s 

13s 

6 

0.26 

1.37 

64s 

77s 

7 

0.31 

1.48 

114s 

121s 


Table 5.1 

Results of numerical experiments. 


The residual was measured in the P norm. The iteration process was stopped once 
the relative residual became smaller than 10~^. In all the experiments V"(l,l) cycle 
has been used. By algebraic complexity we mean the number of nonzero entries in the 
matrices on all the levels divided by the number of nonzeros in the matrix on finest level. 

The rate of convergence is computed as an average reduction of P-norm of residual 
per iteration. 

Results of experiments are summed up in Table 5.1. The description of testing 
problems follows. 

Experiment No. 1: Planar elasticity on unstructured mesh (Fig. 5.1). Poisson 
ratio 0.3, number of nodes 10610, number of degrees of freedom 21358. Boundary 
conditions : Dirichlet and Neumann. 

Experiment No. 2: Large anisotropic problem (5.1) with jumps in coefficients as 
in Fig. 5.2 and g(a;, y) = 0. Number of nodes 10®. The problem has been discretized on 
the regular square grid. 

Experiment No. 3: 3D problem (5.2) with random coefficients 

rcii = exp(mi), W 22 = exp(rn 2 ), ^33 = exp(rn 3 ), 

where rn, is a random number uniformly distributed in the interval [ln(10“^),In(lO^)]. 
Number of nodes 68921. The problem was discretized on the regular square grid. 

~= fi^^y) on (o,i) x (o,i), 

u = 0 on do,. 


£^{wij{x,y)^) = f{x,y), on (0,1) x (0,1), 


vdu 


u 


0 on 50. 


(5.2) 
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Fig. 5.1. Mesh 1 ( Courtesy of Charbel Farhat, Center for Aerospace Engineering, University of 
Colorado, Boulder). 


Experiment No. 4: 2D anisotropic problem (5.1) with jumps in coefficients as in 
Fig. 5.2 and a) q{x,y) = 0.1, b) q{x.,y) = 1, c) q{x,y) = 10. Number of nodes 160000. 
The problem was discretized on the regular square grid. 

Experiment No. 5: Planar elasticity on an imstructured mesh (Fig. 5.1) dis¬ 
cretized by finite elements with randomly scaled basis. Poisson ratio 0.3, number of 
nodes 10610, number of degrees of freedom 21358. Boundary conditions : Dirichlet and 
Neumann. 

Experiment No. 6 : Biharmonic problem discretized on the rectangular square 
grid. Number of degrees of freedom 48400. Boundary conditions: essential. 

Experiment No. 7: Fourth order problem (5.3) with coefficients given by (5.4) 
discretized on regular square grid. Number of degrees of freedom 48400. Boundary 
conditions: essential. 


dx"^ 




dx'^ 



f{x,y) on (0,1) X (0,1) 


(5.3) 
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a = 10-2 


6= 102 

a = 102 

a = 1 

6 = 10-2 

6=1 



Fig. 5.2. The coefficients a{x, y), b(x,y). 


a{x,y) = l, b{x,y) = e^^^^ (5.4) 

The second order problems are discretized by PI finite elements. The fourth order 
problems are discretized by a 27-point difference formula with Lagrangean degrees of 
freedom. 
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Abstract 

We consider numerical solution methods for the incompressible Navier- 
Stokes equations discretized by a finite volume method on staggered grids in 
general coordinates. We use Krylov subspace and multigrid methods as well 
as their combinations. Numerical experiments are carried out on a scalar and 
a vector computer. Robustness and efficiency of these methods are studied. 
It appears that good methods result from suitable combinations of GCR and 
multigrid methods. 


1 Introduction 


We compare various iterative methods for linear systems resulting from discretization 
of the time-dependent incompressible Navier-Stokes equations. Before discretization 
the physical domain is mapped onto a computational domain consisting of a number 
of rectangular blocks. In this paper we restrict ourselves to the one-block case and 
two space dimensions. For the space discretization we use finite volumes and a stag¬ 
gered grid. For the time discretization we use the Euler Backward finite difference 
scheme together with pressure correction. 

Krylov subspace and multigrid methods are two types of promising iterative meth¬ 
ods for the solution of large unsymmetric non-diagonally dominant linear systems 
of algebraic equations. These types of methods are much used to solve discretized 
Navier-Stokes equations. Our research using Krylov subspace methods is described in 
([10], [11], [12]) and using multigrid methods is described in ([14], [15], [16], [4] - [6]). 
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As Krylov subspace method we choose the GMRESR method [9] (a combination of 
GCR [1] and GMRES [7]). For the multigrid method we use a Galerkin coarse grid 
approximation and two different smoothers. 

Since many of the faster computers are vector computers, we also compare the vec- 
torization properties of the different methods. Although probably in the near fu¬ 
ture parallel computers will supersede vector computers, the comparison will remain 
relevant because good vectorization properties imply in many cases good parallel- 
lization properties. Furthermore, vectorization aspects remain of interest because 
future high-performance parallel computing platforms will often contain vector pro¬ 
cessors. Finally, good vectorization normally implies good superscalar-performance 
on many RISC processors. Note that GMRESR is easy to vectorize, since most of its 
arithmetic operations are vector updates, vector-vector and matrix-vector operations. 
Vector length becomes large as the grid is refined, which improves speed on vector 
computers. With respect to multigrid we have the following choices: 

- use of a simple smoother, like point Jacobi, which is easily vectorized but not 
robust, or 

- use of a more complicated smoother, like ILU, which is robust but harder to 
vectorize. 

A disadvantage of multigrid methods is that the occurrence of vectors of short length 
is inevitable, since use of coarse grids is necessary. This diminishes multigrid efficiency 
on vector computers. 

The foregoing observations on the advantages and disadvantages of the two types 
of methods suggest that combinations of them may be profitable. We compare the 
following methods: 

Method 1: GMRESR with ILU preconditioning; 

Method 2: Multigrid with Jacobi line smoothing; 

Method 3: Multigrid with ILU smoothing; 

Method 4: GCR with Method 2 as inner loop; 

Method 5: GCR with Method 3 as inner loop. 

In this paper, general boundary-fitted coordinates are used to compute flows in com¬ 
plicated geometries. In general coordinates, the incompressible Navier-Stokes equa¬ 
tions are formulated in standard tensor notation as follows [8]: 

dU°‘ 

momentum equations —|- + g^^U'p) 

continuity equation U" = 0, (2) 

where U°‘ is the contravariant representation of the velocity vector field, p the pressure. 
Re the Reynolds number, and g^^^ the metric tensor. The range of Greek indices is 
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{1,2}. We use a staggered grid arrangement and a lexicographic ordering of the grid 
points. Due to the use of virtual cells the number of z/d-, and p-points is the 
same. Using finite volume discretization in space and the backward Euler method for 
time discretization, we obtain the following discrete systems at each time step (see 
[8] for details): 


1 

At 



All J^u 
A^i 


A31 



n +1 

= 0 , 


( ui \ 

U 2 

VP / 


n+l 


(3) 


(4) 


where ui, and p are algebraic vectors that approximate on the grid y/gU^ and 
s/gU^ and p, respectively, with y/g the Jacobian of the mapping, and fi and repre¬ 
sent source terms. The nonlinear terms have been linearized with Newton’s method. 
The linear operators (A^i A^^), resulting from discretization of the divergence oper¬ 
ator in the continuity equation, and Ai^ and A^^, resulting from discretization of the 
gradients of the pressure in the momentum equations, do not depend on tirhe. The 
remaining operators are time-dependent. 


Equations (3) and (4) are solved by the pressure correction method, as presented 
in [3], which consists of three steps. In the first step, the momentum equations are 
solved to give an intermediate value for the velocities, using the old pressure: 


izi + 

A 21 


A 12 

Al + A^^ 



/ \ n+1 / \ n / 

[p J ^At[u^ J [ A23 


pT (5) 


This equation system behaves like a discretization of a convection-diffusion equation. 
The main diagonal is enhanced by a contribution 1/At due to the time-derivative. 
Then the pressure equation, which is derived from the momentum equatiop (3) and 
the continuity equation (4), is solved to give the difference p""'"! — p"". 

(A^' ) ( A-) (P”*'- P”) = 


The coefficient matrix of p"+i — p" does not change with time, and resembles a 
discretization of the Laplacian operator (in general coordinates), but is not symmetric. 
Finally, the velocities at time step n -f 1 are computed by means of 




u" 




(7) 


In the next section we describe the iterative methods used for the solution of (5) and 

( 6 ). 
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2 Solution Methods 


In this section the iterative methods to be tested are described. The GMRESR 
method combined with ILU type preconditioners is given in Subsection 2.1. This 
is a summary of the methods described in [12]. In Subsections 2.2.1 and 2.2.2, the 
multigrid methods using an alternating Jacobi line smoothing and an ILU smoothing 
are presented. New methods, consisting of combinations of GMRESR and multigrid, 
are proposed in Subsection 2.2.3. 


2.1 Method 1: GMRESR with ILU preconditioning 


In Section 1 we have seen that there are two types of linear systems to be solved: 
the momentum equations and the pressure equation. Each has its own characteristic 
properties. We use GMRESR for both but with different preconditioners. The GM¬ 
RESR method is defined in [9], successfully applied to the Navier-Stokes equations 
in [11], and analysed further in [10]. The GMRESR algorithm can be formulated as 
follows: 


Algorithm GMRESR 
To = b — Axo, k = —1 
while [[rA:+i[|/[[ro|| > tol do 
k := k + 1 

apply one iteration of GMRES(m) to Ay^ 
denote the result by 


Tfc and 


JO) 

'-k 

for 


Au 


( 0 ) 


0,1, • • ■ , A: — 1 do 


ai 


:Tji) 


C7 C 


, 0 + 1 ) _ Ji) 


= cl 


end do 

(k) 

Cfc = cl ’/ 


(i+l) 

CXiCi] 


u 


(0 


- O.-Ui 




(k) 


,U)i 


Xfc+i = Xfc + UfcC^rA;; 

end while 


Ck 


Ckclvk 


GMRESR consists of a GCR outer loop and a GMRES inner loop. In every outer 
iteration, m iterations are used in the GMRES inner loop. Only in the final outer 
iteration it is possible to do less than m inner iterations (see [9]). In this paper the 
GMRESR algorithm is used with the ‘mm alfa' truncation strategy (see [10]). A 
truncation strategy is necessary to restrict the required memory. Truncation means 
the following: choose the number (ntrunc) of search directions (ufc) that may be kept 
in memory. If the number of iterations becomes larger than ntrunc, a search direc¬ 
tion Uj and its companion Cj(= Auy) are overwritten by the new search direction 
Ujt+i and c^;+i. The min alfa truncation strategy is a method to decide which search 
direction should be discarded by the following criterion: find j such that aj = 


im 



satisfies the following equation: 


Q.j\ = min \ai 

0 <i<ntTunc 


( 8 ) 


To obtain an efficient solver, GMRESR is combined with a preconditioner. For the 
pressure equation we use the classic incomplete LU decomposition (all fill-in is ne¬ 
glected). For the details of this preconditioner and the combination with GMRFSR 
we refer to [12]. We use an ILUD preconditioning for the momentum equations. In 
this type of preconditioning the off-diagonal parts of L and U are the same as that of 
the given matrix and only the diagonal is adapted. In all the numerical experiments 
given in Section 3, we use the GMRFSR(5) method (so m = 5). 


2.2 Multigrid methods 


In this paper we use multigrid methods consisting of the F-cycle with one pre- and one 
post-smoothing. In Subsection 2.2.1 the coarse grid operators are defined. The two 
smoothing operators used are given in Subsection 2.2.2, corresponding to Methods 2 
and 3. In Subsection 2.2.3 the combined methods are given. 


2.2.1 Formulation of coarse grid operators 


Coarse grid operators are formulated by means of Galerkin coarse grid approximation 
[13]. For brevity, we write equations (5) and (6) as 


/ All A12 W ui \ / fi \ 

A22 j f2 j ’ 

A^^p = f®. 


(9) 

( 10 ) 


Let / be the grid index, with / = 1 indicating the coarsest grid. Galerkin coarse grid 
approximation is carried out from grid 1 -|- 1 to grid / as follows: 
momentum equations 


/ Aii(0 Ai^h) 

( fl (0 
[ f 2 (') 


\ R 2 A 21 ('+I)pi 
/ Rirh'+i) \ 

y R 2 r 2 ('+ 1 ) j 


R 1 A 12 ('+I)p 2 \ 

R2A22('+l)p2 j ’ 


( 11 ) 


and pressure equation 

= R3A""('+1)P^ = R3r"('+i). ( 12 ) 


The r’s are the residuals, for example, = f^ — A^^p. Here the R’s and P’s are 
restriction operators and prolongation operators, which are described below. 
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Standard cell-centered coarsening is used; a cell on the next coarse grid is formed by 
taking the union of four fine grid cells. The restriction operators and are for the 
momentum equations and R^ for the pressure equation. The prolongation operators 
P^, and P^ are applied to u^, and p, respectively. The prolongation used for 
the coarse grid corrections is the same as in Galerkin coarse grid approximation. 


The operators R^ and R^ use so-called hybrid interpolation, which, for example for 
R^, is obtained by using the adjoint of linear interpolation for in direction 1 but 
the adjoint of piecewise constant interpolation in direction 2 . Operator R^ is simply 
the adjoint of piecewise constant interpolation. Operators R^ and R^ are given by 


fR'l = 1 

we 

2 

we 


_ 1 

’ 1 

1 ' 

L J 2 

we 

2 

we 

’ L J 

” 2 

1 

1 


(13) 


where li; = 0 when the ‘west’ points are on or outside of the ‘west’ boundary and 
w = \ elsewhere, and similarly for s, e and n. R^ is similar to R^. The elements 
with an underscore correspond to the fine grid point 2k when restriction results in a 
function value in the coarse grid point k. The prolongation operators P^, P^ and P^ 
employ bilinear interpolation. The adjoints P^* and P^* of P^ and P^ are given by: 




nw 

2n 

ne 






(4 — n)w 
(4 — s)w 

2(4 - n) 
2(4 - s) 

(4 — n)e 
(4 — s)e 

? 



(14) 

sw 

2 s 

se 






nw 

n(4 

— w) 



n(4 — e) 

ne 


(4 — n)w 

16 — 4(n H- nw 

16- 

- 4(n -|- e) -f ne 

(4 — n)e 


(4 — s)w 

16 — 4(s + w) + sw 

16- 

- 4(s e) + se 

(4 — s)e 

9 

sw 

s(4 

— w) 



s(4 — e) 

se 



and P^* is similar to P^*. For a more detailed exposition of these transfer operators, 
see [13] and [16]. 


2.2.2 The smoothing operators 

In this subsection we describe the smoothers which are used in the multigrid method; 
Jacobi smoothing and ILU smoothing. The reason for this choice is that Jacobi 
smoothing has good vectorization (parallellization) properties but is not robust, whereas 
the ILU smoothing is robust but not easily vectorized. 

Method 2: Multigrid with Jacobi smoothing 

Our Jacobi smoothing method consists of one horizontal Jacobi line iteration followed 
by one vertical Jacobi line iteration. The momentum equations are smoothed in a 
decoupled way, i.e., the two momentum equations are smoothed successively. In a 
horizontal smoothing iteration, mutually independent tridiagonal systems have to be 
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solved: MjSxj = Vj for a horizontal line j. The three non-zero elements at row i in 
Mj are denoted by lij,dij, and i<, j. The matrix Mj is factorised into: 

Mj = {L, + D,)D-\Dj + Uj) (15) 

where Lj and Uj have only one non-zero diagonal below and above the main diagonal, 
equal to lij and i/.,j and Dj is a diagonal matrix. Comparable formulae are used in 
a vertical smoothing iteration. Variables are updated after each horizontal and after 
each vertical step with a fixed underrelaxation factor w = 0.7. 

Method 3: Multigrid with ILU smoothing 

Suppose that the equation to be smoothed is denoted by 

Ax = b. (16) 

A smoothing iteration is given by 

dx = M“^(b — Ax),x := X-f cudx (17) 

with Lo — 0.8 fixed. For the ILU smoothing we choose M = (L -f- D)D“^(D + U), 
where L and U are strictly lower and upper triangular matrices, and D a diagonal 
matrix. Matrices L and U have non-zero entries in the positions corresponding to 
the standard 9-point stencil pattern and are chosen such that the elements of M 
belonging to the 9-point pattern are equal to the corresponding elements of A. The 
momentum equations are smoothed in the same decoupled manner as in Method 2. 
Again, factorization takes place only at the beginning of multigrid iterations for a 
time step, and L, D and U are kept until the next time step. 

2.2.3 The combined methods 

The methods presented below are very flexible. In many other combinations of Krylov 
subspace and multigrid methods, the inner loop procedure must be the same for ev¬ 
ery outer loop iteration. In these methods this is not necessary, so in different outer 
iterations one may use different inner loops, for instance a mix of GMRES and multi¬ 
grid, or a different number of iterations with multigrid or multigrid with different 
smoothers, etc. The methods are based on the GMRESR idea where we use a GCR 
outer loop and a GMRES inner loop. The algorithms for the new methods are given 
below and only differ in the construction of the new search directions. 

Method 4: GCR with Method 2 as inner loop 

This method is obtained by replacing GMRES(m) in the inner loop of Method 1, by 
Method 2. 

Method 5: GCR with Method 3 as inner loop 

This method is obtained by replacing GMRES(m) in the inner loop of Method 1, by 
Method 3. 
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3 Numerical Experiments 


3.1 Test Problems 


We consider four test problems: an oblique driven cavity problem, an L-shaped driven 
cavity problem, a backward facing step problem [2], and a 90° bend problem [11]. The 
grids used for these problems are shown in Figure 1. We study these problems for 
various time steps and grid sizes. Furthermore for every problem two values of the 
Reynolds number are used. For the driven cavity problems we take Reiow = 1 and 
R&high = 1000, in the backward facing step problem Reiow = 50 and Rchigh = 150, 
whereas in the bend problem Reiow = 500 and Re high = 1000. The number of time 
steps is fixed at 40.. This number is a rather arbitrary choice, because our purpose 
here is not to solve problems until steady state, but to investigate the performance 
(efficiency and robustness) of solution methods. Based on numerical experiments, the 
following stop criterion is chosen: the iterative solution of the systems at each time 
step is terminated if the ratio of the norm ||r|| of the residual to the norm ||ro|| of 
the residual at the beginning of the present time step satisfies ||r||/||ro|| < to/, with 
tol — 10“^ for the momentum equations and tol = 10“® for the pressure equation. In 
Subsection 3.2 experiments on a scalar computer are described whereas Subsection 
3.3 contains the results on a vector computer. 

3.2 Experiments on a scalar computer 

In this subsection we present numerical experiments on an HP 735 computer. We have 
run all methods described in Section 2 for the test problems given in Subsection 3.1. 
For brevity, here we only present a representative subset of the results. In Subsection 
3.2.1 the momentum equations are considered, whereas in Subsection 3.2.2 we show 
results for the pressure equation. 

3.2.1 The momentum equations 

The properties of the linear systems originating from the discretized momentum equa¬ 
tions that influence the iterative solvers depend on: the size of the time step, the 
Reynolds number, the grid size, and the shape of the space domain. Below, the in¬ 
fluence of these parameters is considered in more detail. In the first part we restrict 
ourselves to the oblique driven cavity problem, only in the final part results are given 
for all test problems. The reason for this is that the results for the other problems 
are comparable with those of the oblique driven cavity problem. 

Dependence on At, the Reynolds number and the grid size 
In Table 1 we give some measurements concerning Method 1 and Method 3 applied 
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a. 



b. 



c. d. 

Figure 1: Grids for the four test problems: a. The oblique driven cavity problem 
(32 X 32); b. The L-shaped driven cavity problem (32 x 32); c. The backward facing 
step problem (48 x 16); d. The 90° bend problem (16 x 64). 
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Re = 1 


Re 

= 1000 


Grid 

At 

u 



pvi Pp 

h 




Pvi Pp 

Method 1 

32 

.0625 

19 

7, 7 

5, 9 


13 

1, 7 

1, 

8 


X 

.125 

19 

7, 7 

5, 8 


14 

2, 7 

2, 

8 


32 

.25 

19 

8, 7 

5, 8 


15 

3, 7 

3, 

8 


64 

.0625 

151 

74, 57 

8,13 


90 

14, 56 

2, 

12 


X 

.125 

158 

81, 57 

9,12 


93 

18, 55 

3, 

11 


64 

.25 

162 

86, 57 

10,13 


104 

28, 56 

4, 

12 


128 

.0625 

1501 

774,642 

14,22 


830 

97,648 

2, 

22 


X 

.125 

1617 

879,653 

18,22 


870 

132,652 

4, 

22 


128 

.25 

1655 

917,653 

20,23 


951 

213,653 

6, 

23 



ilethod 3 

32 

.0625 

74 

26, 35 

4,14 

.246,.371 

63 

20, 30 

3, 

12 

.0871,.366 

X 

.125 

74 

27, 35 

4,14 

.240,.373 

66 

24, 30 

4, 

11 

.142 ,.313 

32 

.25 

74 

27, 34 

4,14 

.226,.372 

72 

29, 30 

6, 

11 

.250 ,.346 

64 

.0625 

257 


4,14 

.229,.370 

224 

76,110 

3, 

12 

.0933,.351 

X 

.125 

255 

bibi 

4,14 

.215,.371 

240 

91,111 

4, 

11 

.138 ,.366 

64 

.25 

252 

97,117 

4,13 

.200,.370 

IQQI 

93,109 

4, 

11 

.214 ,.357 

128 

.0625 

1073 

424,499 

4,13 

.203,.370 

QJII 

395,470 

4, 

10 

.163 ,.354 

X 

.125 

1058 

425,484 

4,13 

.194,.370 

1056 


4, 

11 

.179 ,.338 

128 

.25 

1045 

425,470 

4,12 

.190,.368 

1099 

0^3 

4, 

12 

.191 ,.358 


Table 1: The oblique driven cavity problem on the HP: the total CPU time U, the 
CPU times ty and tp for the solution of the momentum equations and the pressure 
equation, respectively, the numbers of iterations ky and kp in the final time step, and 
the reduction factors py and ,pp of the multigrid algorithm in the last iteration in the 
final time step. 


to the oblique driven cavity problem. The behaviour of the other methods is compa¬ 
rable to Method 3. We observe that the number of iterations of Method 3 is more or 
less independent to the various choices of Af, i?e, or the grid size. 

Now, we consider the dependence of Method 1 (GMRESR) for the various choices. 
The main diagonal of the momentum matrix is enhanced by a contribution 1/At due 
to the time derivative. So for small At the matrix is diagonal dominant. It appears 
from Table 1 that the number of iterations of the GMRESR method grows, if At 
increases. Comparing the results for the two Reynolds numbers, it appears that GM¬ 
RESR converges much faster for Re = 1000 than for Re = 1. Finally, as expected, 
the number of GMRESR iterations increases for increasing grid size. 
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Figure 2: CPU times per grid point on the HP for the momentum equation during 
40 time steps, for Reiou, and At = 0.0625. 


Problem dependence and comparison 

For a comparison of the various methods on the four test problems we plot the CPU 
time on an HP 735 per grid point for 40 time steps against the grid size. In these 
figures we use the following symbols: 


Method 1: 
Method 2: 
Method 3: 
Method 4: 
Method 5: 


solid lines and point marks, 
dotted lines and circles, 
dashed lines and stars, 
dotted lines and plus marks, 
dashed lines and x-marks. 


Where no symbols are shown they are off-scale. For Reiow fhe results are given in 
Figure 2 and for Rchigh the results are given in Figure 3. 

First we discuss the combination of GCR and multigrid. From Figures 2 and 3 it 
appears that the GCR acceleration of the Jacobi smoothed multigrid is better than 
multigrid itself. If the smoother is sufficiently powerful, as for instance for Method 3, 
where we use an ILU smoother, then the combination of GCR and multigrid gives a 
slightly worse performance. In these cases, the number of iterations is the same but 
the CPU time increases somewhat due to the GCR overhead. 
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Figure 3: CPU times per grid point on the HP for the momentum equations during 
40 times steps for Rthigh and At = 0.0625. 

Secondly, we compare Method 1 with the best multigrid method: Method 3. It ap¬ 
pears that for Method 3 the CPU time per grid point is independent of the grid size 
and the Reynolds number. For Method 1 there is more variation: the CPU time 
increases for a larger grid size and a smaller Reynolds number. For a large Reynolds 
number Method 1 is much faster than Method 3. For the driven cavity problems and 
a small Reynolds number, Method 1 is more efficient for medium grid sizes, whereas 
Method 3 is the best method for large grid sizes. For the oblique driven cavity prob¬ 
lem the break-even point is in the range [64, 128] and for the L-shaped driven cavity 
problem the break-even point is in the range [128, 256]. 

Finally we discuss robustness. Methods 1, 3 and 5 are equally robust. For most prob¬ 
lems they work well. Only for the 90° bend problem there are some failure cases (not 
shown here) when At is large and Re large. The least robust method is Method 2; 
it suffers from convergence problems when either the grid is refined or At is large for 
some problems. But when it is combined with GCR, resulting in Method 4, robust¬ 
ness is improved very much. Sometimes when Method 2 fails to work, Method 4 still 
works rather satisfactorily. However, Method 4 falls behind Methods 1, 3 and 5 for 
Re large. 
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Figure 4: CPU times per grid point on the HP for the pressure equation during 40 
time steps. 


3.2.2 The pressure equation 


The properties of the discretized pressure equation depends only on: the grid size 
and the shape of the space domain. 

Grid size dependence 

The multigrid and combined methods require the same number of iterations for in¬ 
creasing grid size. Again Method 1 depends on the grid size; the number of iterations 
grows for increasing grid size. This is illustrated by Table 1 where the results for the 
oblique driven cavity problem are given. 

Problem dependence and comparison 

The CPU time on an HP 735 per grid point for 40 time steps is shown in Figure 4. It 
appears that for both smoothers the combination of GCR and multigrid is more ef¬ 
ficient then multigrid itself. Especially in the oblique driven cavity problem, Method 
4 is two times as fast as Method 2. Also for the strong ILU smoother the CPU time 
for Method 5 is considerably less than for Method 3. 

Finally, we compare Method 1 with the best multigrid method: Method 5. It appears 
that Method 1 is more efhcient for medium grid sizes, whereas Method 5 is more 
efficient for large grid sizes. For the driven cavity problems the break-even point is in 
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range [64, 128] whereas for the other problems the break-even point is in the range 
[32, 64]. For the pressure equation Method 1 has a super linear convergence behaviour 
[12], which means the reduction of residuals is faster in later iterations than in the first 
ones. Since the multigrid and combined method are linear convergent, this implies 
that decreasing the termination criterion tol would benefit Method 1 and vice-versa. 


3.3 Experiments on a vector machine 


In this subsection we report on some experiments on a Convex C3840. First, we com¬ 
pare Methods 1, 3 and 5, because they are the best methods on the scalar machine 
and have different vectorization properties. Thereafter, Methods 3 and 5 are com¬ 
pared with Methods 2 and 4 to analyse the performance of methods using a weaker 
smoother but with greater vectorization potential and using a stronger smoother but 
with smaller vectorization capability. 


Comparing the best methods 




Figure 5: CPU times per grid point on the Convex during 40 time steps for the L- 
shaped driven cavity problem, with Re = I and At = 0.0625. Left: the momentum 
equations, right: the pressure equation. 

In Figurp 5 we present the CPU time per grid point against grid size for the L-shaped 
driven cavity problem. To show the effect of an increasing vector length, computa¬ 
tions on a 256 x 256 grid are included. From this figure it appears that the convergence 
behaviour of the methods is comparable to that on a scalar machine: the efficiency of 
Method 1 deteriorates and that of Methods 3 and 5 improves with grid refinement. 
Due to the good vectorization properties of the Krylov methods the break-even point 
moves to finer grids and the GCR overhead for the combined methods becomes neg¬ 
ligible. Finally, the curves for Methods 3 and 5 become flatter when going to finer 
grids, which indicates that the efficiency gain from a larger vector length is gradually 
exhausted. 
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Comparing the vectorization properties of the smoothers 
It appears that the higher hlflop rate of Methods 2 and 4 does not compensate the 
slower rate of convergence, although on the vector machine they compete better than 
on the scalar machine. This is true for all test problems and is illustrated with the 
momentum equations of the L-shaped driven cavity problem in Table 2. Note that 
for a low Reynolds number Methods 2 to 5 are comparable, but for a high Reynolds 
number Methods 3 and 5 are superior to Methods 2 and 4. Method 2 does not work 
on finer grids and even fails on the 256 x 256 grid. 


grid size 


Re 

= 1 



Re = 

1000 


32 

64 

128 

256 

32 

64 

128 

256 

Method 2 

0.039 

0.022 

0.013 

0.011 

0.028 

0.023 

0.045 

oo 

Method 3 

0.040 

0.021 

0.012 

0.010 

0.032 

0.017 

0.011 

0.010 

Method 4 

0.040 

0.020 

0.012 

0.010 

0.032 

0.020 

0.017 

0.017 

Method 5 

0.043 

0.022 

0.013 

0.010 

0.033 

0.018 

0.012 

0.009 


Table 2: CPU time per grid point on the Convex during 40 time steps for the mo¬ 
mentum equations for the L-shaped driven cavity problem. 


4 Conclusions 

We have investigated numerically five iterative methods, namely. Method 1: GM- 
RESR: GCR with GMRES as inner loop. Method 2: multigrid with a Jacobi line 
smoothing. Method 3: multigrid with an ILU smoothing. Method 4: GCR with 
multigrid with Jacobi line smoothing as inner loop and Method 5: GCR with multi¬ 
grid with ILU smoothing as inner loop, in the context of application to the solution of 
the incompressible Navier-Stokes equations in general coordinates on staggered grids, 
using the pressure correction method in the time-dependent case. 

Erom our numerical experiments we draw the following conclusions: 

- For the solution of the momentum equations with a high Reynolds number 
Method 1 is the best method. 

- For solving the momentum equations with a low Reynolds number Method 1 is 
faster for medium sized grids, whereas Method 3 is the best method for large 
sized grids. 

- For the pressure equation Method 1 is also optimal for medium grid sizes. For 
large grid sizes Method 5 is the most robust and efficient method. 

- The GCR outer loop of Methods 4 and 5 speeds up the rate of convergence, 
especially for weak smoothers (Method 4). 
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Finally, we renaark that the break-even point, where the efficiency of the Krylov sub¬ 
space method is equal to that of the multigrid method, depends on many factors. 
Some of them are: the domain of the test problem, the termination criterion, the 
Reynolds number, the computer used (scalar, vector, or parallel), etc. In Section 3 
we have investigated numerically in which direction the break-even point moves de¬ 
pending on a change of one of these factors. 
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AN ALGEBRAIC MULTIGRID SOLVER FOR NAVIER-STOKES 
PROBLEMS IN THE DISCRETE SECOND-ORDER 
APPROXIMATION 


R Webster 

Roadside, Harpsdale, Halkirk, Caithness, KW12 6UL, Scotland, UK 


ABSTRACT 

An algebraic multigrid scheme is presented for solving the discrete Navier-Stokes 
equations to second-order accuracy using the defect-correction method. Solutions have 
been obtained for problems involving both structured and unstructured meshes, with the 
resolution and resolution grading controlled by global and local mesh refinements. 

The solver is efficient and robust to the extent that no underrelaxation of variables has 
been required to ensure convergence, but rates of convergence can be improved with small 
amounts of underrelaxation of the velocity-pressure coupling. Provided that the 
computational mesh can resolve the flow field, convergence characteristics are almost mesh 
independent. Rates of convergence actually improve with refinement, asymptotically 
approaching mesh independent values. For extremely coarse meshes where dispersive 
truncation errors would be expected to prevent convergence (or even induce divergence), 
solutions can still be obtained by using explicit underrelaxation in the iterative cycle. 


INTRODUCTION 


Solution of the equations of motion for viscous fluids in the discrete approximation 
demands powerful computing resources. This is because the flow fields of practical interest 
are invariably complex and require a high degree of spatial resolution. Resolution of length 
scales that span many orders of magnitude may be necessary even for stable lamina flows. 
If Q is some measure of the linear resolving power of a discretisation (such as an 
appropriately scaled inverse of the nodal separation), then the number of discrete equations 
to be solved, N, will scale as 


N~Qd (1) 

where d is the number of spatial dimensions. Since, moreover, the computational work will 
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scale as NP, where P depends on the solution method (P > 1.0), the required computing 
time, T, will scale as 


T ~ QPd (2) 

Clearly T can be a very'strong function of the required resolution. For example, for 3D 
finite-element problems that require direct solution methods (such as Gaussian elimination), 
the exponent can be as large as 9 (i.e., P = 3, d = 3). Since in fluid dynamics we are looking 
for orders-of-magnitude improvements in resolution it is essential to develop efficient 
solvers with optimum scahng (p = 1.0). It is also important that this scaling hold good for 
non-uniform, unstructured meshes so that the nodal economy can be maximised by 
matching the density of nodes to the required resolution, which may be both anisotropic and 
inhomogeneous. 

In a previous paper [1], a new iterative solver was presented for the discrete Navier- 
Stokes equations in the first-order approximation which addressed these requirements. The 
method was based on a fully implicit Algebraic Multigrid (AMG) scheme. This paper 
describes changes to the scheme which can virtually eliminate the need for underrelaxation 
in the iterative cycle. Performance data have been obtained for a number of problems on 
both structured and unstructured computational meshes. Here results for the sudden- 
expansion test problem are presented for second-order accuracy using the defect correction 
method. 


THE DISCRETE APPROXIMATIONS 

The discrete equation sets for the flow variables are derived from a finite-volume 
discretisation of a finite-element mesh by enforcing the conservation of mass and 
momentum for an incompressible fluid. The simplest possible linear element is used: the 
triangle (in 2D), which is capable of giving second-order accurate equations. Control 
volumes are constructed around each vertex node by joining the centroid of each element to 
the centre of each side (Figure 1). Within any given element, just one flux value is used for 
the control surfaces so formed, and this is obtained by a special interpolation. The centroid 
provides the single interpolation point. A second discretisation within the element is used to 
derive the interpolation equation. Figure 2 shows three examples of the subcontrol volumes 
that have been used; the smallest is the one chosen for this work. The scheme is similar to 
those proposed by Prakash[2], Hookey[3], and Schneider and Raw[4]. 

If V represents the set of nodal velocities, Vg the set of interpolated velocities within 
elements, and p the set of nodal pressures, then enforcing the conservation laws for both 
nodal control volumes and element sub-control volumes delivers the following set of 
algebraic equations: 


A(Vg) V -I- G p = s 


(3) 
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Figure 1: Illustrating the linear triangular element, element assembly, and the construction 
of the control-volume tesselation; one control volume is highlighted. 


Ag(Vg) Vg + F(Vg) V + Gg p = Sg 

(4) 

DVg =0 

(5) 


where A and G are the nodal advection-diffusion and gradient operators respectively; Ag 
and F are each part of the advection-diffusion operator for elements; Gg is the element 
gradient operator; D is the nodal divergence operator; and s and Sg represent the momentum 
source/sink arrays for the nodal control volumes and for the element sub-control volumes, 
respectively. 

The matrix Ag is diagonal, so the solution of equation (4) is trivial; that is, 

Vg = Ag-i(Sg-Fv-Ggp) (6) 

Direct substitution into equation (5) enables the following subset of coupled equations to 
be formed for the nodal variables: 


1 

> 

2 

o 

_ 1 

V 


s 

(DAg-iF) (DAg-iGg) 

P 


(DAg-iSg) 


(7) 


The solution of equations (6) and (7) is obtained by direct iteration using a predictor- 
corrector strategy for Vg and [v p]; the AMG solver providing the coupled solution of 
equation (7) for [v p ]. 

If upstream values are used in the enforcement of momentum conservation for nodal 
control volumes, then equation (7) will be first-order accurate. For this work, a second- 
order approximation is also required. The simplest possible second-order approximation 
was adopted using equal proportions of upstream and downstream values for the advected 
momentum across the control surfaces, equivalent to the central differencing of finite- 
difference methods. 




Figure 2. Interpolation for element velocities, Vg; three subcontrol volumes that have been 
used for a local discrete solution of the equation of motion. 

THE ITERATIVE SOLUTION METHOD 
By writing equations (6) and (7) in the more concise form as 

Vg = Ae-l(Se-H(p) (8) 

L(Vg)(p = f (9) 



A(Vg) G 


V 


s 

where L(Vg) = 

(DAg-iF) (DAg-iGg) 

^ 9 = 

P 

, H = [FGg], f = 

(DAg-iSg) 


and by writing the first and second order approximations of L(Vg) and f as Lj, L 2 and fj, f 2 
respectively, the following iterative procedure can be constructed [5] starting with v®g= 0 
and (p® = 0: 

Vg" = Ag-i ( Sg - H q)") n > 0 

Li(Vgn) (pn+i = fj" n < m ( 10 ) 

Li(Ven) (pn +1 =f2^+[ Li(Vg“) - L2(Ve'‘) ] (p“ n > m 

where m marks a suitable point in the iteration sequence for switching on the defect 
correction, [ (LjCVg") - fj®) - (L 2 (Ve") - f 2 ") ](?". At convergence (p"+i = (p" = (p, and the 
second-order equation 


L2(Ve)cp = f2" (11) 

will be satisfied within the permitted tolerance. The convergence should, moreover, proceed 
at a rate determined more by the properties of Lj than those of L 2 . 


The equation system 


Li(Ven)(p'i+l = 


( 12 ) 


where f“ is now understood to include the defect correction if n > m, may be represented 
graphically as a connected nodal network with a one-to-one correspondence between 





variables (equations) and nodes; the connections between nodes represent the coupling 
between equations. For like variables, there will also be a one-to-one correspondence 
between connections and the edges of elements in the computational mesh. For unhke 
variables, connections may be regarded as displacements in an abstract dimension. To 
distinguish the nodal network from the computational mesh, it will be referred to as the 

“ algebraic grid ” or simply the grid. 

In an iterative solution procedure based on point relaxation, each node of the grid is 
visited in turn and that variable is updated/corrected entirely on the basis of local 
information (i.e., from those neighbours to which the node has direct connections). Because 
of this, a single sweep through the grid system will only see changes propagating short 
distances (i.e., of order one nodal spacing). Long range propagation is a diffusion-like 
process that requires many iterative sweeps. If kj is a relevant propagation distance 
expressed in units of nodal spacing, then the number of iterations required, n, will scale as 

n~?ii2 = (Q/Qj)2 (13) 

where Q is the maximum resolving power; Qj is the minimum resolving power required for 
the resolution of A,j. Since the computational cost of one iteration will scale as N, the total 
number of nodes to be visited, the required computing time will scale as 

T ~ NQ2 = Q<i+2. (14) 

Thus, from the grid system equivalent of equation (2) 

P = 1 + 2/d. (15) 

Clearly, solvers based on point/local relaxation can scale poorly, with P = 2 or P = 5/3 
for 2D and 3D problems, respectively. To achieve optimum P = 1 scaling it is necessary to 
have an efficient propagation of corrections over all length scales simultaneously. This 
requires multigrid methods. 

AMG methods [6,7] exploit a hierarchy of reduced equation sets (coarse grids) derived 
from and including the base set (fine grid). Ideally, coarse grid generation proceeds 
recursively such that each successive grid is a consistent representation of the problem at a 
reduced scale of resolution, Qj, associated length scale A,;. Just one sweep of a relaxation 
procedure at this level will be sufficient to propagate changes over Xj (i.e., Q = Qj); hence, 
from equation (13), n = 1. With a sufficient number of grids spanning the complete range of 
length scales relevant to the problem, an efficient propagation over all length scales can 
take place simultaneously within one relaxation sweep. Thus, considering the first level of 
coarsening, if K is a suitably chosen restriction operator, it may be appHed to the base 
set (12) to form the reduced system 




( 16 ) 
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where = (K L| K"^). If is derived on the basis of the residual r = f - Ljtp; 


r<^ = Kr = K(f-Li(p) 


( 17 ) 


then a solution of equation (16) provides a correction that can be used to improve tp : 

(p (p + K^tp*^ (18) 

The procedure is as follows: restrict residual errors to the coarse grid using equation (17); 
reduce the coarse-grid (long-range) errors by applying local relaxation methods to equation 
set (16); prolongate the coarse-grid correction and update the fine grid solution using (18); 
and reduce the fine-grid (short-range) errors by applying local relaxation to equation 
set (12). Clearly equation (16) has the same form as equation (12) so the procedure can be 
applied recursively to generate smaller equation sets for successively coarser scale 
corrections. In this way a “ multiscale ” correction, KT'(p‘^, can be assembled for updating tp. 

A coarsening procedure based on that devised by Lonsdale [8] for scalar field variables 
has been used to generate the reduced equation sets. This consists of seeking out the 
equations with the strongest coupling (the largest off-diagonals in the L matrices) and 
joining them together by adding the corresponding matrix coefficients. Some care is 
required in implementing the procedure [8,1]. The elementary matrix representation of 
Lonsdale's restriction operator K (dimension N; x Nj, Nj < Nj < N), if required, can be 
formed by simply adding the appropriate rows of the Nj x Nj unit matrix. The reduction 
factors ( Nj / Nj) may be freely chosen, though values of about 0.5 are usually used. 

Since here the equation system is for coupled vector and scalar fields, the procedure is 
implemented in a way which preserves the block structure of the L matrix operator. 
Combining equations for different field variable types is thus forbidden; coarsening is only 
permitted in “ real space ", equivalent to choosing a block-diagonal K matrix. Note that this 
does not prevent different coarsening for different field variables. 

The process can be terminated when no further reduction in the number of equations is 
possible, and the matrix dimension is then equal to the number of continuum flow variables. 
In [1] and in this work, however, the process is actually terminated earlier at between about 
30 and 60 equations. 

The elementary K-matrix restriction combines equations in equal proportion. However, a 
better coarse grid approximation can be achieved if fine grid equations are combined in 
proportions that respect their relative importance at the coarser level of resolution. 
Therefore provision is made here for a more general, weighted restriction. For AMG 
solvers, this is particularly important both for uniform and non-uniform discretisations alike 
because, even if an initial fine grid is a regular array of identical nodes, the algebraic 
coarsening process is unlikely to preserve such uniformity. Thus, if R and P are the actual 
restriction and prolongation operators to be used, then fine grid and coarse grid weighting 
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operators, W and W^, are introduced such that 


subject to the scaling rule 


R = [W^^j-iKW 


(19) 


RIP = P 


( 20 ) 


where the unit operator, I, for the fine grid transforms under the action of R and P into the 
unit operator, P, for the coarse grid. Combining these equations gives 


W‘= = KWP. (21) 

For computational expediency P has been chosen to be simply in this work so that the 
coarse grid weighting operator is simply the fine grid operator transformed using 
elementary restriction and prolongation. 

For a finite-volume discretisation, a natural choice for W is the diagonal operator formed 
from the set of nodal control volumes. Equation (21) can then be simply interpreted as 
control-volume agglomeration and the restriction procedure R defined by equation (19) as 

1. Conversion of the fine grid equations into the naturally additive net flux form (W). 

2. Formation of the coarse grid equations (K,KT). 

3. A conversion of the coarse grid equations back to the normal form ([W*^]"^). 

The coarse grid approximation so produced results in a robust and an efficient solution 
algorithm. 

Following the R-restriction of residual errors down through the grid hierarchy, with Vj 
relaxation sweeps at each level, the multiscale correction is assembled by the reverse 
procedure of the upward P-prolongation of solutions (possibly scaled by a), this time 
applying V 2 relaxation sweeps following each prolongation. This is the well known V-cycle 
schedule, V(Vi,V 2 ). In this work, however, the full multi-grid cycle F(Vi,V 2 ) has been 
adopted in which the upward leg of each cycle itself contains nested V-cycles (Figure 3). 
Furthermore, because the coarsest grid only contains between 30 and 60 nodes, a direct 
solver is used to obtain an accurate solution. 

Two relaxation schemes have been adopted, both based on point Gauss-Seidel (PGS) 
relaxation. For the intermediate coarse grids, PGS with optimum damping is used. If 
= L + D -I- U is the standard splitting for Gauss-Seidel relaxation (L is the lower 
triangular block, U is the upper triangular block, and D the diagonal of Lj'^), then the 
algorithm for v relaxation sweeps is 
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0 Fine grid 
1 
2 

3 

4 


\ 

/ 


Transfer residual 
Transfer solution 


O vi pre-restriction reiaxation sweeps 
® vg post-prolongation relaxation sweeps 
■ Direct solver 


Figure 3 . F-cycle strategy for transferring residuals and corrections. 


d' = (L-FD/I(r'-1-Ud'-i) 
zi = Li^di 

ai = <(zi)T ri-1 > /<(zi)Tzi) (22) 

(pc(i) = (pc(i-1)+o'd' 
r' = r'"’’ - o' z'. 

Before prolongation, the coarse grid corrections (p*= are also scaled by the factor 

a = ((Li‘=(pc)T, r^) / <(Li<^tp")T, Li^cpc). (24) 

For the fine grid, an approximate 4-direction, point Gauss-Seidel algorithm for 
unstructured meshes is used (4-PGS). This involves some preprocessing for the formation 
of 4 continuous line orderings of nodes such that each node is visited once only within each 
line, and lines attempt, wherever possible, to pass through each node from different 
directions. 

The residual reduction factor, or fractional error reduction for each F-cycle, p, depends 
on the efficiency of the local relaxation process (smoothing) and on the quality of the coarse 
grid approximation [6,7,9]. Empirical p factors are defined and results presented for several 
test problems. 

Although Lj does not have to be positive definite, it must have block diagonal matrices 
that are suitable for solution by scalar AMG methods [6] ; diagonal blocks must be at least 
positive semi-definite. The first-order discretisation based on the advection of upstream 
momentum) produces block diagonal matrices for the velocity-component equations that 
should satisfy that requirement. The block diagonal matrix for the pressure equations is 
positive semi-definite in any case. 

Boundary conditions are implicitly contained in L|. At least one pressure node is 
implicitly fixed in all calculations. No special measures are necessary for dealing with 
boundary conditions at the lower levels of the grid system. The necessary information is 
automatically transferred by the restriction operator. 
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Implicit underrelaxation of both velocity and pressure is commonly used to ensure 
convergence of Navier-Stokes linear solvers. For this coupled AMG linear solver, 
underrelaxation has not been necessary. Provided that the above described, weighting in the 
restriction procedure is employed, no underrelaxation has been required for any problems 
tackled so far. However, a small amount of underrelaxation can improve the rates of 
convergence for both inner and outer iterations. It can be implemented without prejudicing 
the long-range spatial coupling as follows. All entries in the off-diagonal blocks of L| are 
reduced by a factor to and/or all entries in the diagonal blocks are increased by l/co, with 
appropriate compensations of the right hand sides of the equation sets, evaluated using 
previous iterates (p*!. Optimum convergence rates occur for co values in the range 
1.0 > CO > 0.9. 

Note that it is also possible to relax the coupHng between like variables by increasing 
just the diagonal entries of the relevant diagonal block and making the appropriate right- 
hand side compensations. This is not recommended. It loosens the spatial couphng that 
AMG is supposed to be deahng with, which results in a degradation of convergence 
performance (including the scahng). 

PERFORMANCE 

The solver has been applied to a number of well estabhshed test problems. Here flow in 
a channel with a sudden asymmetric expansion is presented. This problem incorporates 
several features of complex fluid behaviour that can present difficulties for solvers, 
particularly at high Reynolds numbers (e.g., singularities, recirculation, boundary layers, 
entering flows, outlet flows). Some of these features have been isolated for special 
investigation by those involved in the development of multigrid methods. 

Of interest are the quality of the second-order solutions, the rates of convergence and, in 
particular,-the mesh dependence of both of these aspects of performance. To assist in the 
presentation and analysis of results it will be useful to introduce mesh resolution and 
grading factors and to define the convergence factors. 

Mesh Resolution and Grading Factors 

The inverse nodal separation (linear resolution) and its variation with direction and 
position (grading) is used to characterize the meshes. The global extremes of the resolution 
and grading will be sufficient for most purposes. Thus, reference is made to the maximum 
linear resolving power Q, the maximum global grading factor F, and the maximum local 
grading factor y. Q is defined as the ratio of the largest characteristic length scale divided 
by the closest nodal spacing. F is defined as the ratio of the maximum to minimum nodal 
separations for elements in the mesh regardless of their position. The local grading factor 
for any node ia the mesh is the ratio of the largest to the smallest separation of the node 
from its immediate neighbours (i.e., for elements common to the node). Directional aspects 
are thus largely ignored except where reference is made to longitudinal and transverse 
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resolution and grading factors F^, Yxx’ Qy> Ty, and Yyy< respectively. Aspect ratio Yxy will 
also be referred to. In this case the nodal separations in any chosen element are both 
selected and weighted according to their degree of alignment with the relevant direction. 

Convergence factors 

Convergence characteristics will be quantified in terms of the convergence factor p", 
where 


pn = ll5(pnlU/ll6(p"-ilU (24) 

where Stp'^ is the multiscale correction for the iteration index n. Thus, the larger the rate of 
convergence, the smaller the convergence factor. The average convergence factor p for a 
sequence of N, Navier-Stokes (i.e., outer) iterations is 

p = { II 5(pN lU / II StpO 11 ^ } 1 /N = {Ho" pn}'/N (25) 

The residual reduction factors, p and p;, for inner iterations are defined similarly but in 
terms of the Euclidian norm of the residual errors, that is 

pi = II ri II 2 / II ri-1 II 2 (26) 

where in this case r* is the residual following the F-cycle, index i. 

Various F-cycle schedules have been tried from F(1,0) to F(8,2). On the fine grid, V 2 = 1 
actually corresponds to one application of the 4-PGS smoother. 

In practice, the important convergence parameter is the fractional reduction of error per 
unit of computing time which may not be quite the same as the reduction of error per 
iteration as defined in equation (26). However, with a fixed number, v, of F-cycles per 
iteration the computing time per iteration will be more or less constant; then as long as 
p''« p, p will be equivalent to the convergence rate in time for all practical purposes. The 
number of F-cycles does not have to be large to satisfy this requirement. Also, there is little 
if anything to be gained by insisting that p'' be extremely small, since much of the work 
done will be immediately undone when the non-linear terms are updated in the outer 
iteration. 


ASYMMETRIC SUDDEN EXPANSION TEST PROBLEM 

To test the solver on a problem with inflow and outflow boundary conditions, it has been 
applied to the asymmetric, sudden-expansion problem. This is a high aspect ratio problem, 
so it offers a convenient test for the performance of the solver on meshes with highly 
elongated elements. 
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Flow enters a two-dimensional channel with a parabolic inlet velocity profile. Some 
distance from the inlet there is a one sided step increase in channel width to 3/2 the 
original. Flow separates at the re-entrant comer and a re-circulation zone is established 
after the step. The axial extent of the circulation is marked by the point of re-attachment, or 
the point at which uni-directional flow is re-established across the entire width of the 
channel. This depends on the Reynolds number. Re. Results have been published for 
Reynolds numbers up to and, in some cases, exceeding Re = 250. Re is based on step 
height and mean inlet velocity (note that this definition gives values 6 times smaller than 
those based on hydraulic diameter and maximum inlet velocity.) 

A significant length of the expanded channel (exceeding 3 hydraulic diameters) needs to 
be modelled to ensure that the imposed outlet boundary condition does not unduly 
influence the behaviour upstream. Thus, the problem is bound to be one of large aspect 
ratio (~10 ) and, in view of the need for fine resolution near the point of separation, the 
discretisation could prove to be nodaUy expensive if uniform meshes are used. Thus, only 
non-uniform meshes have been adopted for this investigation and results for just one 
unstructured mesh type have been selected for presentation. 

The prototype triangulation is illustrated in Figure 4. It consists of 81 proto-elements 
which have been assembled to give the highest resolution at the point of separation and so 
that the lateral resolving power Qy is maintained moderately high up to the point of re¬ 
attachment. The actual meshes used were obtained by a q-fold nested refinement of each 
proto-element into as many as = 64 congruent triangles, giving a finest mesh of 
5184 elements (2717 nodes). The mesh is anisotropic and inhomogeneous with grading 
factors Yxx = 4, Yyy = 4, Yxy = 5.3, F^ = 32, Ty = 4. Dirichlet boundary conditions for velocity 
and free pressure boundary conditions apply on all surfaces except the outlet. The latter 
(continuitive and constant pressure) was placed 38 step lengths from the expansion. 



consisting of 81 proto-elements. = SF^q ; Qy = 3Fyq ; where q = level of nested 

refinement. F^ = 32; Fy = 4; Yxx = 4; Yyy = 4; Yxy = 5.3. 


The reduction factors for this test problem were wit hi n the expected range for point 
Gauss-Seidel relaxation. Table 1 gives the average values for a low Reynolds number. Both 
definitions of Reynolds number are used (i.e., the first. Re, is based on step height and 
average inlet velocity, and the second, Re^, is based on hydraulic diameter and maximum 
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inlet velocity). 


N 

1236 

2133 

3273 

4656 

8151 

q 

3 

4 

5 

6 

8 

p[F(l,0)] 

.109 

.159 

.184 

.215 

.306 

p[F(2,l)] 

.042 

.059 

.091 

.114 

.143 


Table 1: Reduction factors for the asymmetric-sudden-expansion test problem; 
Re = 16.67; Reh = 100. 


Convergence factors for the finest mesh for the same range of Reynolds numbers are 
presented in Table 2. This reveals slower rates of convergence; nevertheless,these rates are 
still better than those for segregated solution methods. In Table 3, typical values for p are 
given at four different levels of refinement at just three selected Reynolds numbers. 

The convergence performance would appear to be better than that achieved by Dick and 


Re 

16.67 

50 

100 

150 

200 

Reh 

100 

300 

600 

900 

1200 

P 

.426 

.587 

.684 

.754 

.816 


Table 2: Convergence factors for the asymmetric-sudden-expansion test problem; 
level of refinement q=8; number of unknowns = 8151. 


N 

2133 

3273 

4656 

8151 

q 

4 

5 

6 

8 

p(Re=16.7) 

.464 

.432 

.426 

.426 

p(Re=50) 

.602 

.608 

.587 

.587 

p(Re=150) 

.911 

.807 

.771 

.754 


Table 3; Convergence factors for the asymmetric-sudden-expansion test problem; 
N = number of unknowns; q = level of nested refinement. 


Linden [10], who obtained second-order accurate, coupled solutions to the same test 
problem discretised using a flux-difference splitting approach. They also used a defect- 
correction scheme, but their solver was based on a geometric (FAS) multigrid method. 
Their published result for the case corresponding here to Re = 100 was p = 0.81, which 
compares with p = 0.68 in Table 2. Dick and Linden also reported a deterioration in 
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convergence performance with mesh refinement, which has not been observed in this work. 
The evidence is for constant or improving convergence rates with mesh refinement 
(Table 3). 


Navier-Stokes Performance. 

The axial extent of the recirculation eddy following the step expansion will be used as 
the gauge for assessing the quahty of the solutions. Experimental data is available, but not 
for a truly parabolic inlet velocity profile. Predictions of the experiment would have to be 
based, therefore, on the measured profile, which is known to result in a short eddy. Since 
over-diffusive calculational methods would tend to underpredict the eddy length anyway, 
there could well be fortuitously good first-order calculations of this experiment wherever a 
parabolic inlet velocity profile has been mistakenly used. Here such complications are 
avoided by assessing the performance against other calculations of the idealised problem 
only. Thus the results are compared with the higher-order accurate calculations of Hutton 
and Smith [11] and with the first and second-order accurate calculations of Shaw [12]. 

For Reynolds numbers up to Re = 200, the resolution requirement should be satisfied for 
the mesh specified in Figure 4 (for q = 8). Results for the range Re = 16.7 to Re = 200 are 
given in Figure 5 as the 5 filled-circle data points. For comparison, two sets of data from 
Hutton and Smith are plotted, one as a continuous curve, which was obtained using a 
coarse mesh of 69 biquadratic rectangular elements (246 nodes), and the other as 4 open- 
circle data points obtained using a finer mesh of 256 quadratic triangular elements 


ASYMMETRIC SUDDEN EXPANSION 
Recirculation eddy length versus Reynolds number 



Figure 5: Length of the recirculation eddy versus Reynolds number: a comparison with the 
published results of Hutton and Smith and Shaw. 
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(565 nodes). The agreement is within 2% in all cases. 


Five open-square data points from the calculations of Shaw, using 600 rectangular linear 
elements are also shown for the Reynolds number range Re = 12.5 to Re = 100. The two 
lower points at Re = 12.5 and Re = 25 are second order accurate and are consistent with the 
other data. The remaining three points were obtained using a first-order scheme for 
advection. They underpredict the length of the recirculation by as much as 27% at 
Re = 100. Shaw attributed this to the coarseness of the mesh and the false numerical 
diffusion associated with the first-order upwind scheme. 


DISCUSSION AND GENERAL COMMENTS 

The above results give a representative sample of the tests to which the solver has been 
applied. On the basis of all tests, the following general comments are made and the 
subsequent conclusions drawn. 

It has not been found necessary to use any underrelaxation of variables to ensure 
convergence of the linear solver. The rates of reduction of the residual errors within inner 
iterations are typical of those to be expected for the PGS-based relaxation methods used and 
the simple inter-grid transfer operators being exploited. Note that, from the point of view of 
the coarse grid approximation, the values quoted are for the worst Navier-Stokes cases; 
those with low Reynolds numbers. They are nevertheless more than adequate for the 
problems attempted. The weak dependence of p on mesh size is an inevitable consequenee 
of the primitive inter-grid transfer operators used. However, it is sufficiently weak to have 
little if any impact on the scaUng of p. A higher order interpolation would be required for a 
better coarse grid approximation, and this is unlikely to be cost effective. 

Providing the computational mesh has a sufficient resolving power for the problem, rapid 
convergence superior to that possible with segregated solution methods is achieved. When, 
however, the mesh has insufficient resolution the convergence can stall (p —> 1) unless an 
explicit underrelaxation of velocity is exploited. This is thought to be due to the influence 
of the dispersive truncation error on the convergence process. For finer meshes, explicit 
relaxation is not required and rates of convergence improve with refinement, asymptotically 
approaching mesh-independent values as the resolution is increased (i.e., P ^ 1 as 
Q ^ oo). No evidence has been found for P > 1 in any applications so far. If this proves to 
be a better performance than that achieved with other defect-correction multigrid 
algorithms, the accmacy of the present discretisation may be responsible. 

CONCLUSIONS 

An efficient and robust iterative numerical method is presented for solving the eoupled 
equations of motion for viscous fluids in the discrete second-order approximation. 
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Provided that discretisation has sufficient spatial resolution for the flow field, a rapid 
convergence to machine accuracy is achieved that is almost mesh independent insofar as 
the convergence rates either improve or are maintained for increased nodal concentration. 

With sufficient resolution, the method is also robust to the extent that no underrelaxation 
of flow variables has been required to ensure convergence. However, small amounts of 
underrelaxation can improve convergence rates. Converged solutions can also be obtained 
when the mesh resolution is insufficient to resolve the flow field, but in the more extreme 
cases of low resolution some explicit underrelaxation is necessary to prevent a stalling of 
the outer-iteration convergence. 

The discretisation provides accurate solutions on relatively coarse meshes. This is 
probably due to the interpolation scheme used for the momentum flux within elements, 
which is based on a local discrete solution of the equations of motion within the element. 
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Abstract 

In this paper we describe some classes of multigrid methods for solving large 
linear systems arising in the solution by finite difference methods of certain 
boundary value problems involving Poisson’s equation on rectangular regions. 
If parallel computing systems are used, then with standard multigrid methods 
many of the processors will be idle when one is working at the coarsest grid 
levels.We describe the use of multiple coarse grid multigrid (MCGMG) meth¬ 
ods. Here one first constructs a periodic set of equations corresponding to the 
given system. One then constructs a set of coarse grids such that for each grid 
corresponding to the grid size h there are four grids corresponding to the grid 
size 2*h. Multigrid operations such a.s restriction of residuals and interpola¬ 
tion of corrections are done in parallel at each grid level.For suitable choices 
of the multigrid operators the MCGMG method is equivalent to the parallel 
superconvergent multigrid (PSMG) method of Frederickson and McBryan. The 
convergence properties of MCGMG methods can be accurately analyzed using 
spectral methods. 


1 Introduction 


In this paper we describe some classes of multigrid methods for solving large linear 
systems arising from the numerical solution by finite difference methods of certain 
boundary value problems involving Poisson’s equation 
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Uxx Uyy — f(^X,y) (1-1) 

on rectangular domains. Here f{x,y) is a given function. The solution u{x,y) of 
(1.1.) is required to satisfy the Dirichlet condition 


u{x,y) = g{x,y) (1.2) 

on the boundary. The standard 5-point finite difference equation is used to derive a 
linear system of the form 


Au = b (1.3) 

Standard multigrid methods often exhibit excellent convergence rates on sequen¬ 
tial computing machines. However, if parallel machines are used, many of the proces¬ 
sors will be idle when the program is working on the coarse grid levels. Frederickson 
and McBryan [3] developed and analyzed a method, called the “parallel superconver- 
gent multigrid (PSMG) method.” With the PSMG method the same number of grid 
points are used and more of the processors are used at all grid levels. For other works 
dealing with the idea of using more than one coarse grid to speedup convergence cf. 

[2], [4], [6], [9]. 

In this paper we describe a class of multigrid methods which we refer to as “mul¬ 
tiple coarse grid multigrid methods” (MCGMG methods) where, as in the case of 
PSMG methods, more than one coarse grid is used at each coarse grid level. 

With a MCGMG method, one first constructs a periodic set of equations corre¬ 
sponding to the given system. One then constructs a set of coarse grids such that for 
each grid corresponding to the grid size h there are four grids corresponding to the 
grid size 2h. The actual number of coarse grids depends on which coarsening scheme is 
used. There are many ways to choose the multigrid operators for a MCGMG method. 
For suitable choice of the operators the MCGMG method is equivalent to the PSMG 
method of Frederickson and McBryan. The convergence properties of MCGMG meth¬ 
ods can be accurately analyzed using spectral methods; see, e.g., [7]. The analysis of 
many other iterative methods based on such a periodic set of equations can be found 
in, e.g., [1], [5], [8]. 

In Section 2, we derive Dirichlet problems and construct related discrete periodic' 
problems corresponding to (1.1) and (1.2). In Section 3, we apply a procedure to 
derive a discrete periodic problem corresponding to a discrete Dirichlet problem. 
In Section 4, we discuss the use of MCGMG methods for solving discrete periodic 
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problems. In Section 5, we show that a certain choice of multigrid operators can make 
a MCGMG method equivalent to some well known parallel multigrid methods. We 
also give convergence factors for the MCGMG methods and the standard multigrid 
methods for discrete Dirichlet problems. 

It should be noted that the methods described in the paper have only been shown 
to apply to problems involving Poisson’s equation on the rectangle with Dirichlet 
boundary conditions. However, it can be shown that with slight modifications, the 
method also applies to problems involving Neumann boundary conditions. 

As pointed out by the referee, the methods used in the present paper are closely 
related to more general methods based on the use of symmetries; see for example [3] 
and the references given therein. 


2 Discrete Dirichlet Problems and Discrete Peri¬ 
odic Problems 

In this section we consider classes of discrete Dirichlet problems and discrete peri¬ 
odic problems in one and two dimensions. First, we consider the Dirichlet problem 
involving the differential equation 

-u" = f{x) 0<x<l (2.1) 

and the boundary conditions 

u(0) = a, u(l) = (3 (2.2) 

To define a discrete Dirichlet problem we choose an even positive integer N and the 
grid size h = N~^ and seek a function u(x) defined on the points x = 2h ,..., Nh 

such that 

2u[x) — u{x + h) — u{x — h) = h^f{x) 

< X = h,2h,...,{N-l)h (2.3) 

u(0) = a, u{l) = (3 

For the case N = A, this leads to the linear system 


2 -1 

0 


u{xi) 


+ a 

-1 2 

-1 


U{X2) 

= 

h‘^f{x2) 

0 -1 

2 


U{X3) _ 
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(2.4) 



Since the matrix of the system (2.4) is nonsingular, a unique solution exists for any 
a,l3 and f{x). 

Let us now consider a periodic problem with period P = 1 based on (2.1). We 
require that u{x) be periodic with period P and that (2.1) holds for all x. We also 
require that f{x) be periodic with period P and that 

r f(x)dx = 0 (2.5) 

Jo 

We now define a discrete periodic problem as follows. We require that u{x) be 
periodic of period P on grid points 0, ±fi, ±2h,..., and that u{x) satisfy 

2 u{x) — u{x + h) — u{x — h) = h? f{x)^ x = 0, ±2/i,... (2-6) 


We also assume that f{x) is periodic of period P and that, instead of (2.5), we have 

N-l 

Ef, = o (2-7) 

0 

where h — PjN and where /j = f{xj), j = 0,1,..., — 1 and Xj — jh. 

To actually solve the periodic problem defined by (2.6) it is sufficient to consider 
a finite subset of points. Thus in the case M = 4 we have 

2 uq — u_i — Ui = 

2ui — uq — U2 = h^fi 

2u2 -U1-U3 = /i^/2 (2-8) 

2 u 3 — U2 — U4 = /i^/a 

2 u 4 ' U 3 “■ U 3 — Jl 

where ui = u{jh) and fi - f{jh). By periodicity we have u_i = U 3 and U 5 = Ui. 
Thus we obtain the system 





h‘^f{xo) 

h'^fixi) 

fiV(^2) 

h'^fixs) _ 


(2.9) 


It can be shown that the matrix of the above system is singular and the rank is 
— 1 = 3. Since the null space of A is spanned by the vector (1 1 1 1)^ and since 
the system is consistent by (2.7), it follows that (2.9) has a solution which is unique 
to within an additive constant. 
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For general M, the eigenvalues of the operator defined by the left member of (2.6) 


are 


Us = 2 — 2 cos(2s7r/i); 
and the corresponding eigenvectors are 


,0 


)(x) 


27risx. 




s = 0,1,..., — 1 


( 2 . 10 ) 

( 2 . 11 ) 


For the two-dimensional case we first consider the Dirichlet problem involving th® 
Poisson equation 


-Us;x - Uyy = /(x, y) 0 < .X < 1; 0 < y < 1 (2.12) 

with 

u(x,y) = y(x,y) (2.13) 

on the boundary of the square 0<x<l,0<y<l. To define a discrete Dirichlet 
problem we choose a positive integer N and the grid size h = and we seek a 
function u(x, y) defined on the grid points {jh, kh), j,k = 0,1,N such that 


4u(x, y) - u(x + h,y) - u{x - h, y) 

-u{x,y-\-h) -u{x,y-h) = h'^f(x,y) 
x,y = h, 2h,..., (N — l)h 
u{x,y) = g{x,y) 

X = 0 and x = 1; y = h, 2h,..., {N — l)h 
y — 0 and y = 1; x = h, 2h, ..., {.N — l)h 

Using (2.14) one obtains a linear system of the form 

Au = b 


(2.14) 


(2.15) 


where A is an (N — 1)^ by {N — 1)^ matrix. As in the one-dimensional case, the 
matrix A is nonsingular; hence, a unique solution to (2.15) exists. 

As in the one-dimensional case we can define a discrete periodic problem with 
periods P = 1 in both the x-direction and the y-direction. We require that 

4u(x, y) - u{x + h,y) - u{x - h, y) 

-u{x, yAh)- u{x, y-h) = h‘^f{x, y) 



for x,y = 0,±h,±2/i,.... Also, we assume that f{x,y) is periodic with period P in 
X and y and that 

AT-l N-1 


EE/(A»)=0 (2.17) 

j=0 k=0 
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It can be shown that if (2.17) holds then a solution to the discrete periodic problem 
defined by (2.16) exists and is unique to within an additive constant. Moreover, the 
eigenvalues and eigenvectors of the discrete operator defined by the left member of 
(2.16) are, respectively, given by 

i/s,t = 4 — 2 cos(27r5/i) — 2 cos(27rf/i) (2.18) 

and 

= 0,1,...,A^-1 (2.19) 


3 Construction of Discrete Periodic Problems 


In this section we describe a procedure for constructing a discrete periodic problem 
corresponding to a given discrete Dirichlet problem of the type defined in Section 2. 


We will illustrate the procedure for a problem in one dimension with h = 1/4 and 
M = 4. The procedure for the two dimensional cases is similar. From (2.4) we obtain 
the system 


2 -1 

-1 2 

0 -1 



Ui 


' b, ' 


U2 

= 

62 


U3 


bs 


(3.1) 


where /*• = f{xi), i — 1,2,3 and 


h =1 hVi + o 

< 62 = h^/2 ( 3 - 2 ) 

h = h^h + (3 

V 


We now define bi for i = 0, ±1, ±2, ... as follows: 

0 = So = ^4 = S_4 = bs = S_8 = ... 

h\ = h\ = —S_i = —67 = S_7 = 69 = —S_g = ... 

< ^2 = S2 = —S_2 = —So = S_6 = Sio = —S_io = . . . ( 3 - 3 ) 

= S3 = —S_3 = —S5 = S_5 = Sii = —S_11 = . . . 


Clearly we have = bj for j = 0, ±1, ±2,... and 

r+7 

E = 0 (3.4) 

j=j* 
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(3.5) 


for ]• = 0 ,± 1 ,± 2 ..... 

We now consider the system 

2wj — Wjj^i — Wj^i = bj, j = 0 , ± 1 , ± 2 ,... 

where we require that 

Wj^s — Wj, j = 0, ±1, ±2,... (3.6) 

It is easy to show that a necessary and sufficient condition that u; is a solution of 
(3.5) - (3.6) is that to is a solution of the system 


2 

-1 

0 

0 

0 

0 

0 

-1 


W-4 


0 


1 

-1 

2 

-1 

0 

0 

0 

0 

0 


u;_3 


b-3 


CO 

1 

0 

-1 

2 

-1 

0 

0 

0 

0 


u ;_2 


6-2 


-62 

0 

0 

-1 

2 

-1 

0 

0 

0 


u;_i 


6-1 


-61 

0 

0 

0 

-1 

2 

-1 

0 

0 


Wo 


0 


0 

0 

0 

0 

0 

-1 

2 

-1 

0 


Wi 


6 i 


61 

0 

0 

0 

0 

0 

-1 

2 

-1 


W2 


62 


62 

-1 

0 

0 

0 

0 

0 

-1 

2 




63 


. ^3 


It is also easy to show that the rank of the matrix of the system (3.7) is 7 and that 
the null space is spanned by the vector (1111111 1)^. Therefore, because of (3.4) 
the system is consistent and has a solution which is unique to within an additive 
constant. 


It should also be noted that if 


u = 


Ui 

U 2 

U3 


(3.8) 


is a solution of the original system (3.1) then u is a solution of the expanded system 
(3.7) where 

0 

-Us 

-U2 


U = 


-Ui 

0 


(3.9) 


Ui 

U2 

U3 


in 




Let w be any solution of the expanded system (3.7). Then since (3.7) has a unique 
solution to within an additive constant it follows that for some constant c 

Ui = Wi + c 

< U 2 = W 2 + c (3.10) 

U3 = W3 + C 

If one requires that the sum of the components of w vanish, then w = u must hold, 
since the sum of the components of u vanishes. 

We remark that the process of replacing a vector to by a vector w' = w + c such 
that the sum of the components of w' vanishes is referred to as purification. Thus, if 
u; is a vector of order N and if w' is given by 

1 ^ 

w'i = Wi-—Y^Wj (3.11) 

for i = 1 , 2 ,..., A'", then w' is the purified vector corresponding to w and we let 

w' = V{w) (3.12) 


4 Multiple Coarse Grid Methods 

4.1 One Dimensional Case 

Let Xj = jh with h = 1/N and 

= {xj \ j = 1 - N,... ,N}. (4.1) 

be a grid on the interval ( — 1,1], where A” = 2* for some positive integer k. We 
construct two coarse grids in such a way that all the even-numbered grid points 
belong to one coarse grid and all the odd-numbered grid points belong to another. 
Then, we have 


ft- — {xj I Xj G flh and (j = even)}, (4-2) 

O 4 . = {xj I Xj G O/i and {j = odd)}. (4.3) 

Figure 1 illustrates the grids on two levels, h and 2h for the case N = 4. 

A two-level MCGMG algorithm for the above problem is given in Figure 2. For 
the following analysis, we assume that the full weighting restriction of residuals and 
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-3-2-10 1 2 3 4 


(x = — 1 ) (x = 0 ) (x = 1 ) 


Figure 1: Two-Level Grids in ID with /i = 1/4 

Algorithm: MCGMG2L(A/„ 4°\ &/.) 

1. Do mi pre-smoothing iterations using the smoothing iterative method (e.g., 
damped Jacobi method) to obtain 

2. Compute the residual rh — hh — Ahu'j^-, restrict the residual onto the coarse grids 
and perform purification defined in (3.11) if necessary to obtain 

41 ’ = 44 = 

where 41’ 44 eigenvectors in the null spaces of A^4 rljs . 

respectively. 

3. Solve the coarse grid systems 

4(+)x(+) _ „(+) 

^2h °2h — ^2h 1 ^2h °2h ~ ^2h 

to obtain the purified solutions and S^h . 

4. Interpolate 5^^ and 5^^ onto the fine grid to obtain the new approximate 
solution 

«; = -1+i(d'”4l’+d“’4’)- 

5. Do m2 post-smoothing iterations using the smoothing iterative method and 
purify the result, if needed, to obtain u^\ 

Figure 2: The ID Two-Level MCGMG Algorithm 
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linear interpolation of corrections are used. The full weighting restriction is defined 

by 

f 

\{:rh[x -h) + 2rh{x) + rh{x -\-h)) x e 0+ 




0 


a: e n. 


(4.4) 




0 a; G 

\{rh{x - h) + 2rh,{x) + rh.{x + h)) x e 0. 


and the linear interpolation is defined by 




S2h{x) 


X E flj 


;{S2h{x - h) + S2h{x + h)) xEQ,^ 


(4.5) 


(4.6) 


^d(-)x ^ J W2h{x - h) + 52h{x + h)) xen+ 

[Ph d 2 h){x) ={ 

d2h(X) 


(4.7) 


a: G 

The coarse grid difference operators are defined by the 3-point difference formula, 


e.g. 


(4V4t')M = (2/.)-^[2<5W(x)-4f(i-2/.)-4f(^+2A)l 

I e n+ (4.8) 

(4;’4’)(^:) = (2/.)-n24’(^)-4’(^-2ft)-4’(^ + 2A)] 

X E fi- (4.9) 

The 2h coarse grids can be divided into even coarser grids in a similar way. Figure 3 
illustrates all the grids on three levels, h, 2h and Ah for the case N = A. Figure 4 
shows the corresponding hierarchical relations among these grids. 

A multilevel MCGMG algorithm is similar to the two-level version except the 
coarse grid problems in step 3 are solved by using algorithm MCGMG2L recursively. 
For a better understanding of the multilevel MCGMG algorithm, we list a three-level 
MCGMG algorithm in the following. For convenience of representation, we use the 
symbol v instead of 5 to represent the solutions and b to represent the right-hand side 
vectors on all levels. The solutions on coarse grids should be thought of as corrections 
to the solution of the fine grid. 
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Figure 3: Coarse Grids for an Extended Fine Grid: = 4 
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Ah 



0 ^— 1 - 0-1 — 




Figure 4: Hierarchical Relations Among Grids: N = A 
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Algorithm: MCGMG1D3L(A;„ 6^) 


1. Do mi smoothing iterations on AhUh = bh with initial guess Vh- 

2. Compute 

6<;> = V{Ri-K,) 

3. Do mi smoothing iterations on 

^2h “2/i ■" ®2/i 5 ^2h “2/i ” '^2h 

with initial guesses = 0 and = 0. 

4. Compute 

i'ii-’ = nRiVHV) 
i'ir’ = ^(Mr’4;’) 


5. Solve 


/<(++),,(++) 

^4h ^4h 

/l(-+) (-+) 

^4h ^4h 


°4h ) ^4h ^4h — 


l(-+) 
^4h ) 


4 (—)..{ ) _ V 

^4h ^4h — '^4h 


Ah 
{—) 


6. Correct 


4 ’ ^ +n‘r’ 4 r’) 


^2h ^ 


(-) 

2h 


+ o(^. 


(-+),,(-+) 

2h ^4h 


+ P: 




2h ^4h 


) 


7. Do m2 smoothing iterations on 

^2h “2h — ^2h ) ^2h “2/i ~ ^2h 

with initial guesses '*^ 2 h \ respectively, and purify the results if necessary. 

8. Correct 

v,^v, + i(p<+'4l’ + d'’4’) 

9. Do m2 smoothing iterations on AhUh = bh with initial guess Vh and purify the 
results if necessary. 
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Here we used the purification notation V{v,z) defined in (3.11) and (3.12). In the 
case of = 4, the two 2h coarse grid systems on the second level are given by 


and 


4 (+),.(+) _ 

^ 2 h ^ 2 h — 


{ 2 hy 


2 -1 

-1 2 

0 -1 

-1 0 


0 

-1 

2 

-1 


-1 



{v2h)-3 


iv2h)-l 


{V2h)l 


{V2h)3 


{i>2h)-3 


■ (*>S’)-. ' 

(^>2/i)-l 



{b2h)l 



(^> 2 / 1)3 


0 *^ 

to 


— ^ 2 h 


*^2^ ^ 2 h 



’2 -1 0 -1' 


{v 2 h )-2 

1 

-12-10 


{V2h)0 

{ 2 hy 

0-12-1 


{V2h)2 


ca 

( 

0 

1 

_ 1 


to 

1_ 


{i>2h)-2 


■ (i>^i’)-i ■ 

(&2/i)o 


(i>^;’)o 

(^ 2 / 1)2 



(N 

_ i 


to 

1_ 


= bi 


2 h 


(4.10) 


(4.11) 


Here we use V 2 h and 62/1 to represent the fine grid vectors which consist of the coarse 
grid vectors ^-nd , b^ 2 h\ respectively. 

On the third level, the four Ah coarse grid systems are given by 


1 (++)„(++) _ _1 


Hh 




{Ahy 


-2 2 


{v4h)-3 

{V4h)l 


{b4h)-3 


' (f>ir’)o' 

{b4h)l 




l(++) 
O 4/1 5 


(4.12) 


4(+-)«d+“) 
^4h ^Ah 


{Ahy 


2 -2 
-2 2 


1 

1_ 


’ (f>£'’)o ' 

(^4/1)3 




(r’4/i)-i 
{V4h)3 

— '^4h > 


(4.13) 
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2 -2 

-2 2 


{vAh)-2 

{Vih)2 


( 

1 


1 - 

0 

+ 

\ 

(^4/1)2 


-1 

+ 

1 

rO 

_1 


(4.14) 


A 


(—)„(—) _ 1 

to 

1 

to 


{v4h)o 

4h 4h 

J- 

1 

to 

to 

1___ 


iv4h)4 



L 

J L 

(^4/1)0 



I 

_ 1 


. i 4 r\ 


(4.15) 


Here each of the fine grid vectors V 4 h and 64/1 consists of four corresponding 4h coarse 
grid vectors. On the third level, the grid points on a coarse grid are not always 
distributed symmetrically about zero. The systems (4.12) and (4.13) may not be 
consistent in general. However, one can make such a problem solvable by purifying 
the right hand vector. 


4.2 Two Dimensional Case 

In the two dimensional region [—1,1]^ we can define a grid 

flh = {(xj,yk) \ j,k = 1 - N,... ,N} (4.16) 

where xj = jh, yi, = kh and h = 1/A^. On this fine grid, the four coarse grids can be 
defined as illustrated in Figure 5 in the case of = 4. 

A two-level MCGMG algorithm in 2D is a straightforward extension of the cor¬ 
responding two-level MCGMG algorithm in ID defined in Figure 2. For a problem 
AhUh — bh with a given initial guess a two-level MCGMG algorithm in 2D is 
given in Figure 6. 

As in the one dimensional case, a multilevel 2D MCGMG algorithm can be con¬ 
structed by recursively applying the two-level MCGMG method to each coarse grid 
system until the process reaches the coarsest grid level or some preset grid level. 
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i^++) 


( 0 _+) 


Figure 5: Coarse Grid Points for a 2D Problem with h = If A 
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Algorithm: MCGMG2L(Afe, b;,) 

1. Do mi pre-smoothing iterations using the smoothing iterative method (e.g., 
damped Jacobi method) to obtain 

2 . Compute the residual rh = bh~ Ahu',^, restrict the residual onto each of the four 
coarse grids and perform purification if necessary to obtain 

5 = +-!-,—f-,-|—,- 

3. Solve the coarse grid systems 

^2h°2h—^2h-: S — -h-p,-h, H-,-, 

r x~ 
tor 02 h ■ 

4. Purify and interpolate the purified corrections onto the fine grid to 
obtain the new approximate solution 

^2h ='^{^2t\^2h)i -s =-f-}-,—1-,4—■,-, 

< = < + 7Ed*>4’. 

^ 5 

5. Do m2 post-smoothing iterations using the smoothing iterative method to obtain 
and return 

Figure 6; The 2D Two-Level MCGMG Algorithm 
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5 Further Discussion 


A special version of the MCGMG algorithm is determined by the selection of multigrid 
operations such as restriction of residuals and interpolation of corrections are done in 
parallel at each grid level. 

For instance, if one chooses a restriction operator defined by 

{RhSh)ix,y) = -^{Sh{x-h,y + h) + 2Sh{x,y + h) + 6h{x + h,y + h) 
io 

+2Sh{x - h,y) + 4:6h{x,y) + 2Sh{x + h,y) 

+Sh{x - h,y-h) + 2Sh{x, y - h) + 6hix + h,y - h)) 

{x,y)enh. ( 5 . 1 ) 

and an interpolation operator defined by 


{PhS2h){x,y) = S2h{x,y). 


(5.2) 


then one will get a MCGMG algorithm which is equivalent to the parallel supercon- 
vergent multigrid (PSMG) method of Frederickson and McBryan [3]. 

One can also construct a special version of MCGMG equivalent to the frequency 
decomposition multigrid (FDMG) method of Hackbusch [4] by defining the coarse 
grid matrices 

— P-h^^hPh^ s = ++, —h, H—,- (5.3) 

where the restriction operators are defined by 


r' 2 h{x,y) = (i2l'^^V/,)(a;,y) 

= l{rh{x - h,y + h) + 2rh(x, y + h) + rh{x + /i, y + /i) 

+ 2rh{x-h,y)-]r^rh{x,y)-\-2rh{x-\-h,y) (5.4) 

+ rh{x -h,y-h)-\r 2rh{x, y - h) + rh{x + h,y - h)) 

{x,y) e 0++. 

r2h{x,y) = {R\^^\h){x,y) 

= \{-rh{x -h,y + h) + 2rh{x, y + h) - rh{x + h,y + h) 

- 2rh{x - h,y)+ 4rh{x,y)-2rh{x + h,y) (5.5) 

- rh{x - h,y -h) + 2rh{x, y - h) - rh{x + h,y - h)) 

(x,y) e 0_+. 
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(5.6) 


r2h{x,y) = {R^h~\h){x,y) 

= \{-rh{x -h,y + h)- 2r/,(x, y ^ h) - rh{x + /i, y + /i) 

+ 2rh[x - h,y)-\-Arh{x,y) + 2rh{x + h,y) 

- rhix - h,y ~ h) - 2rh{x, y - h) - rh{x -\-h,y~ h)) 

(x,y) e 0+_. 

r2h{x,y) = {Ri~~'’rh){x,y) 

= \{rh{x -h,y + h)- 2r/,(x, y + h) + rh{x + h,y + h) 

- 2rh{x - h,y) + Arh{x,y) - 2rh{x + h,y) (5.7) 

+ rh{x -h,y - h)- 2rh{x, y - h) + rh{x + h,y - h)) 

{x,y) e 1]__. 

and the interpolation operators Pjf'* are defined by 




< 


S2h{x,y) 

-{^ 2 h{x — h,y) + S 2 h{x + fi, y)) 
-^{^ 2 h{x-,y — h) + 52h{x,y + h)) 

^{^ 2 h{x -h,y -h) + 62 h{x - h,y + h) 
+ 52 h{x + h,y-h) + 52 h{x + h,y + h)) 


{x,y) £ n++ 
{x,y) £ 0_+ 
(x,y) G 0+_ 

(x,y) e . 


(5.8) 


hix.y) = {Ph '^'’^2h){x,y) 

' 

S2h{x,y) 

-^{52h{x -h,y) + 52h{x + h, y)) 

= ' ^{S 2 h{x,y-h)+ 62 h{x,y + h)) 

-^{S 2 h{x -h,y -h)P 52h{x - h,y + h) 

+52h{x + h,y - h) + S 2 hix + h,y + h)) (a:,y) G ft+_. 


(x,y) G 0_+ 
{x,y)£ 0++ 
{x,y) £ 0__ 


(5.9) 
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Table 1: Observed Numerical Convergence Factors 


(mi, m 2 ) 

MCGMG 

SMG 

( 0 . 1 ) 

0.15 

0.53 

( 1 , 1 ) 

0.11 

0.36 

( 1 , 2 ) 

0.08 

0.23 


4(a:,y) = h2h){x,y) 

/ 


^2h{x,y) 

(a;,y) G n+_ 

^{^ 2 h{x - fi, y) + 52h{x + fi, y)) 

{x,y) G 0— 

-^(^2/i(a::, y - h) + ^2/i(a^, V + h)) 

(x,y) G fl++ 

-^{S 2 h{x - h,y- h) + 52h{x - h,y + h) 


+S 2 h{x + h,y ~h) + S 2 h{x + h,y + h)) 

{x,y) G 0_+ 

iPt~'’52h)ix,y) 


^2h{x,y) 

{x,y) G fl— 

^(^ 2 /i(a; - A, y) + 52h{x + h, y)) 

{x,y) G 

^(<^ 2 /i(aJ, y - A) + 82 h{x, y + A)) 

{x,y) G fi-+ 

^{ 82 h{x - A, y - A) + 52 h{x - A, y + A) 


-\- 82 h{x + A, y - A) + 52 h{x + A, y + A)) 

(x,y) G 0++. 


(5.10) 


(5.11) 


corresponding to the four coarse grids fl+ 4 ., fl-+, and 0 _, respectively. 

We used the MCGMG method to solve a test problem defined by ( 2 . 12 ) to (2.15) 
with the boundary function g{x^y) = 1 + xy and grid size h = 1/64. The restriction 
operators and the interpolation operators are defined by (5.1) and (5.2) respectively. 
A damped Jacobi method is used for smoothing with the damping factor 0.8. For 
comparison, we also ran the same problem using standard multigrid method with full 
weighting restriction of residuals and the bilinear interpolation of corrections. Table 
1 lists the observed convergence factors which are the average values of 3 cycles. The 
number of grid levels is 6 . mi and m 2 are number of pre smoothing and number of post 
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smoothing respectively. The results indicate that the observed convergence factors 
of a MCGMG method are much smaller than the corresponding ones of standard 
multigrid method. 
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SUMMARY 

The nonlinear multigrid is an efficient algorithm for solving the system of nonlinear equations 
arising from the numerical discretization of nonlinear elliptic boundary problems [7],[9]. In this 
paper, we present a new nonlinear multigrid analysis as an extension of the linear multigrid 
theory presented by Bramble, et al. in [5], [6], and [17]. In particular, we prove the convergence 
of the nonlinear V-cycle method for a class of mildly nonlinear second order elliptic boundary 
value problems which do not have full elliptic regularity. 

INTRODUCTION 


Multigrid methods have been used extensively to solve linear systems of equations which arise 
in the numerical discretization of linear partial differential equations. We call such multigrid 
methods “linear multigrid methods” in this paper. With the development of the linear multigrid 
methods, the multigrid technique also has been applied to the numerical solution of nonlinear 
boundary value problems. Two important algorithms have been proposed so far. One is Newton- 
multigrid iteration, in which a linear multigrid method is used to solve the linear system that 
arises from a Newton iterative method [4]. The other one is the nonlinear multigrid method, 
which is an extension of the linear multigrid method to the nonlinear case [9]. In literature, it 
is also referred to as the Full Approximation Scheme (FAS) by Brandt in [7]. The convergence 
of the nonlinear multigrid method was first studied by Hackbusch in [9] and later by Reusken 
in [11] and [12]. Hackbusch’s nonlinear multigrid theory is based on his linear multigrid theory, 
while Reusken’s analysis is based on the linear multigrid analysis in [3]. 

Recently, Bramble, et al. have established a new linear multigrid theory [5] [6] [17] that has 
generalized the work in [3] and [9] in another way. Using this new multigrid theory, they have 
proved the convergence of linear multigrid methods with non-nested spaces or non-inherited 
quadratic forms, even with weak or no regularity assumptions. The purpose of this paper is to 
extend this new linear multigrid theory to the nonlinear case. 

In this paper, we present the framework of our new multigrid theory. In particular, we prove 
a basic convergence theorem for the nonlinear V-cycle scheme based on two abstract conditions, 
which are referred to as the “smoothing assumption” and the “approximation assumption”. 

’This work was supported in part by the National Science Foundation through award number DMS-9105437 
at the University of Houston. 
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We then apply it to show the convergence of the nonlinear V-cyde method with the damped- 
Jacobi-Newton smoother for a class of mildly nonlinear second order elliptic boundary valxie 
problems which do not have full elliptic regularity. Moreover, our new approach makes it possible 
to analyze the nonlinear multigrid method in more complicated cases, such as, nonnested 
spaces, nominherited quadratic forms, numerical integration, and with weak or no regularity 
assumptions. We have shown the convergence of the nonlinear V-cyde method disturbed by 
numerical quadratures in [14]. We intend to study other cases in subsequent work. 

In comparison to the linear multigrid method, the nonlinear multigrid method has two ad¬ 
ditional parameters. In practice, their choice is an important issue. We investigate this issue 
numerically through a model problem in this paper. We note that this model problem, in part, 
aids in the understanding of the solution procedures used in the code UHBD [IG]. 

The outline of the remainder of the paper is as follows. In Section 2, we introduce the basic 
idea of our nonlinear multigrid analysis. In Section 3, we present a general convergence theorem 
of the nonlinear V-cyde method based on two abstract assumptions, the smoothing assumption 
and the approjtimation assumption. In Section 4, we apply the theory of Section 3 to ahow the 
convergence of the nonlinear multigrid method for a class of mildly nonlinear elliptic boundary 
Value problems. In Section 5, we present numerkal experiments with the nonlinear multigrid 
method focusing on its two auxiliary parameters. 

THE NONLINEAR MULTIGRID METHOD 


We consider a nonlinear variational problem coming from a nonlinear elliptic boundary value 
problem with domain D as follows: Find u € H , such that 

a(u,v) — 0 Vu ^ JT, (1) 

where LT = iJ(D) is an abstract Hilbert space with inner product (•, •), and a(-, ■) is nonlinear 
only with respect to the first variable. 

We assume that a(u,v) is H^-bounded, that is, there exists a constant C, such that 

la(u,u)l < (7(1 + ||M||)jjuj| \fu,v £ H, 

where j|u|l 5= ^(u, u). Using the Riesz representation theorem [1], we then write (1) as 

, 9{u) = 0 , ( 2 ) 

where g : H H is the nonlinear operator such that 

a{u,v) ^ {g{u),v) \/v£H. 

We make another assumption on 5 ^ below: 

Al) g is Frcchet-differentiable on B, and the derivative of g at u, denoted by Dg{u), is a 
symmetric, positive definite, hounded linear operatot from H to itself. 

From Al) it follows that Equation (2) has the unique solution u* [16]. 
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Let U C LT be a neighborhood of u* and T be the image of U under g. Since g satisfies 
the above assumptions, the implicit function theorem [1] implies that g : U T is Si homeo- 
morphism. Thus, for any / € ^, there exists unique u £ such that the following equation 
holds: 

9{u) = /. (3) 

Hence, we may consider equation (3) in the following. 

Let be an approximate solution of (3)i The update u”®’" of u°^'^ is defined by 

,,nety _ „ 

u = u + g, 

with q being a correction term satisfying the following correction equation of 

g{q += /. (4) 

If q is an exact solution of (4), then a direct method for solving (3) is derived. But solving (4) 
is as difficult as solving (3), so we often construct an approximate operator R of g“^ to simplify 
the computational work. 

In the linear case, the correction equation (4) is often written as 

s(9) = /-sK"‘), (5) 

and the term / — g{u°^'^) is often referred to as the residual of Clearly, if the operator R is 
defined by a linear iterative algorithm, then the linear iteration can be written as follows: 

^ ^old ^ _ g{u°^% (6) 

A key factor in the new linear multigfid theory in [5], [6] and [17] is the introduction of the 
operator R that characterizes the linear multigrid method, so the linear multigrid method can 
be expressed in form (6). 

However, when g is nonlinear, the correction equation (4) cannot be written as (5). Noting 
the important role of the residual term in the context of the multigrid method, we introduce an 
“approximate” correction equation of (4) as follows: 

g{sq + u) ~ f + s[f - g{u°^'^)], (7) 

where / = g{u), s is a given positive number and u a given vector. Both s and u are extra 
parameters, compared to the linear multigrid method, and they are chosen so that q approximates 
the solution q of (4) in some sense. Hence, the nonlinear multigrid method can be expressed by 

+ [R{f + s[f - giu°^‘^)]) - u] /s, (8) 

provided that the operator R is defined by the nonlinear multigrid iterative algorithm for solving 
g(^u) — /. This is the main idea of our nonlinear multigrid analysis. 

In the linear case, we can simply set u == / 0 and s == 1. Thus, (8) reduces to (6). In 

this sense, the nonlinear multigrid method defined by (8) is an extension of the linear multigrid 
method. 

To define a nonlinear multigrid operator, we need some further notation given below. 
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Let F be a finite element space with grid size h. Suppose that we have subspaces M* with 
inner product satisfying 

Ml C M2 C ■ ■ ■ C Ml = H. 

Set gi = and define the nonlinear operator gk : Mjt —>• Mk by 

{gk{u),v)k = a{u,v), Wv e Mk, k = 1,2,-■ ■ ,l ~ 1. (9) 

We define a projector Qk : Mk+i -)■ Mk by 

(QkU, v)k = {u, v)k+i, Vu e Mk. 

Obviously, gk satisfies Assvunption Al), so there exist Uk and J^k such that gk is a homeo- 
morphism between them. Hence, for fk € !Fk, we may consider the following equation 

9k{u) = fk, (10) 

and its solution is denoted by 

The smoothing process on Mk is denoted by the operator 

Sr{-,fk):Mk-^Mk (11) 

satisfying ul = SJf(ul; fk). We assume that SJf is Frechet-differentiable on Mk. Here m indicates 
that may be defined by m steps of a nonlinear relaxation iteration (e.g., the damped-Jacobi- 
Newton or the Gauss-Seidel-Newton [13]). Without confusion, we denote S^{u;fk) as 5'^(u). 

Denote H*, = {^ | ^ = /^ + Sk[fk — fl'fc(wife)] for all fk E Mk}. Hfere Uk, Sk and Uk are fixed, and 
fk = gkiuk). We define the nonlinear multigrid operator Bk on Ek inductively in the following 
algorithm: 

Algorithm 1 Given positive integers mi, m 2 and p. 

0)Bi=gfK 

For each (k € "^k with k > 1, there exists an fk G Mk such that (k = fk + Sk[fk— 9k{uk)]. 
We define Bk{Ck) in terms of Bk-i as foliows: 

1) Pre-smoothing : vi = {uk] fk). 

2) Coarse grid correction: V 2 = Vi 

Sfc—1 

where qp is defined by (12). 

9% = 9i-i + [Bk-i{fk-i + Sk-i[fk-i — 9k-i{qi-i)]) — Uk-i] /sk-i, (12) 

for i = 1,2, • • • ,p. Here qo = Uk-i, and 

fk-i = fk-i + Sk-iQk-i[fk — 9k{yi)]. (13) 

3) Post-smoothing : 

BkiCk) = Sk[S)f^{v2; fk) - Uk] + Uk- (14) 
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We note that Algorithm 1 using Uk = Uk = 0-, Sk = 1, and p = 1 reduces to the linear 
multigrid algorithm described in [5], [6] and [17] provided that g is linear. 

THE CONVERGENCE ANALYSIS 

In our nonlinear multigrid analysis, we need a new inner product hk{u,v) defined by 

bk{u,v) = {Dgk{u*k)u,v)k, Vu,u € M*,. 

From Assumption Al) we see that bk{u,v) is symmetric, positive definite. 

With this new inner product, we define an orthogonal operator Pk : Mk+i Mk by 

bk{PkU,v) = bk+i{u,v) e Mk. 

From the definitions of Qk and Pk an important equality follows: 

Qk-iDgk{ul) = Dgk-i{ul_.^)Pk-u k = (15) 

Using the nonlinear multigrid operator Bk, we define the nonlinear multigrid method as 
follows: 

i = 0,1,2, (16) 

with the operator xjjk ■ Mk —t Mk being defined by 

V’fc('^fe) “ T Bki^fk H” ^k[fk fl’A:(^fc)]) fifc /^k- (^'^) 

Noting that gk{uk) = fk and S'^'{uk', fk) = Wfc for z = 1,2, we can show by induction that 

Bk{fk) = Uk. (18) 

Thus, the scheme (16) is consistent in the sense that is a fixed point of the sequence {u^}. 

A fundamental recurrence relation with respect to the nonlinear multigrid operators Bk is 
given in the following theorem. 

Theorem 1 The fundamental recurrence relation for the nonlinear multigrid operators Bk, 
defined by Algorithm 1, is 

I - DBkifk)Dgk{nl) = DSr{u*k){I -[/-(/- DBk-iifk-i)Dgk-i{uk-i)r] (19) 

Dgk-i{uk-i)-^Dgk-iiuU)Pk-i}DSr{ul), 

where k = 1,2, ■ • •, and ul is a solution of gk{uk) = fk on Mk- 

Proof Using (14), we immediately get the following equality: 

Uk + \B,{h + stlf, - gk(u,)]) - fij 1st = SrMSr‘(«i) + ),V«|. € Mf (20) 
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( 21 ) 


The expression (13) of fk~i{u) follows 

fk-i{ul) = A-i- 

Then, by the induction and (18), we can show that 

qi{ul) ?= Uk^i, for z = 0,1,2, • • • ,p. (22) 

Thus, differentiating with respect to Uk at ul on both sides of the equality (20), and using (22), 
we get 

I - DBk{fk)Dgk{ul) = DSriuDlDSriul) + Dq,{ul)/sk^i]. (23) 

Here the operations are based on the calculus in Hilbert space [1]. 

Using (21) and (22), we see that 

Dqiiul) = [/ - DBk-x{fk-i)Dgk^i{uk~i)]Dqi^i{ul) + DBk^i{fk^i)Dfk-i{ul). 

In addition, with (13) and (15), 

Dfk-iiut) = -s,-iQk-,Dg,(ul)DS^(ul) = (24) 

Hence, 

Dqpiul) = {/+[/^ DBk-xifk-i)Dgk-iiuk~i)] + ■■■ (25) 

+ [I - DBk-xihr.x)Dgk^x{uk~x)Y’^}DBk-x{fk-x)Dfk-xK) 

= [I - {I - DBk~x{h^x)Dgk^xiuk-.x)Y]Dgk~x{uk-xr^Dfk-x{ul) 

= -Sk-x[I - (I - DBk-x{h-x)Dgk-x{uk-x)Y]Dgk-x{nk-xr^Dgk^x{uU)Pk^xDSriul). 

Therefore, the equality (19) follows by substituting (25) into (23). D 

The schemes (16) with p = 1 and 2 are often used in practice. We refer to them as the 
V-cycle and the W-cycle methods, respectively. In this paper, we only consider the convergence 
of the nonlinear V-cycle method. The discussion of the other cases is similar. 

Setting p = 1 in (19), we irmnediately get a fundamental recursion relation of the V-cycle: 

I - DBk{fk)Dgk{ul) 

BSrK)[I - DBkMh-i)D9k-x{<^x)Pk^i]DSri<)- (26) 

From the definition of 6fc(', •)> i* follows that the inequality bk{u, u) < bk,^x{'^, u) may not hold 
for some u € Mk-^x- Thus, operator I ~ DBkifk)Dgk{nl) may be negative with respect to the 
inner product 6ji:(-, •)• To show the convergence of the V-cycle, it is sufficient to prove that there 
exists a constant p* in [0,1), independent of hk, such that 


|6j([/ - DBt{It)Dgi,{ut)]u,u)\ < r,Mu, u), 

WueMk, 

(27) 

The following two bask assumptions are made to show (27): 




-Y Vu e Mk, 

(28) 

l|Pffl()‘i)“ia < Cshill - DSl{ui)]u,u), 

Afc 

Vu G Mk, 

(29) 
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where \k is the largest eigenvalue of Dgk{ul), and 0 < < 1. (28) and (29) are referred to as 

“the regularity and approximation assumption” and “the smoothing assumption”, respectively. 
The following theorem provides an estimation for a value of the parameter rjk- 

Theorem 2 Let Bk be defined by Algorithm 1 with p ^ 1 and mi m 2 ^ m. Assume that 

a) Assumptions (28) and (29) hold. 

b) The smoothing process 5^ is formed by m steps of the nonlinear relaxation method Sk, 
such that DSk{u%) is symmetric and non-negative with respect to inner product hk{-, •), and 

DS^iui) = [DS^ulT. 


c) The auxiliary vector tti — 

Then there exist two constants, independent ofhk, 


such that 


m,! 


M(k) 

m9 + M.{k) 


and rjk ,2 = 1 



Cjcl \" 

(2m)^ j ’ 


gk,%bk{u,u) < bk{[I ~ DBk{fk)Dgk{ul)h^'^) S r}k,\bk{u,u), Vu c Mk- (30) 

Furthermore, if m is sufficiently large, th,en the estimate (27) holds with 

Pfc = max{|77fe,il, < 1. 

Here M{k) is a positive constant related to Ci3,Cs,'n%,f3 and k. Its detail expression can be 
found in Theorem 1 of [5], 

Proof With bk{DS'^{uk)u,v) ^ bkiu, DSk'{uk)v), (26) and the definition of Pk-i, we have 

bki[I ^ DBkifk)Dgk{ul)]u,u) = bk{{I Pk-.i)DS^{ul)u,DS]:K)u) 

-^bk^i{[I r. DBk^iih^x)Dgk-i{uU)]Pk^iDS)f{ul)u,Pk^^^^ 

We now show (30) by induction on k. For fc = 1, we have Bi ^ gf^ and ui = Uy Thus, 

\bi{[I-^DBi{fi)Dgi{ul)]u,u)\:=0. 

Suppose (30) holds for A: — 1. We first prove the right hand side of (30). By induction, 
bkiil ^ DBkifk)Dgkiul)]u,u) 

< h,((/ --#fe.x)£)5r(nDu, P5r(4)«) + Pk^iDS^iuDu) 

= bkdl -- Pk^i)DSriu’;^u, DS]f{ul)u) + rik^iffik{Pk-iDSriul)u, DS^iuDu) 

= (1 - Vk^i.iMil - Pk^i)nS^{uDu, T)ST{ul)u) + rjk-iMDSriuDu, DS^iuDu). 


m 




By (28), (29) and the generalized arithmetic mean inequality, 

h{{I - P,_,)DS^(ul)u, DS^{ul)u) 

< 0^(-r - fbk[Vbk [Uf.}u,DSk {u,^)u) 


< Cl[(3rk 


\\D9kiul)DSriul)u\\l 


+ {l-P)r;^-^hiDSr{ul)u,DSr{ul)u)] 


< ClllinCshUI - DSt(u-,))DS!"(ul)n, u) + (1 - /3)r; ‘-’bt(DS^(ul)n, DS^(ul)u)] 

- - ®5|”(u:))u,«) + (!- l3)rl^h(DS'^(ul)u, £ISr(«:)x)l. 

Combining the above inequalities gives 

bk([I - DBk(fk)Dgk(ul)]u,u) 

+(i - m-i.v)clCs^rM\i - os|”K)lH,ti). 

Now, with the same proof as that in the proof of Theorem 1 of [5], we have that 

(1 — »?fc-i,i)C'|(l — I3)r^. + r}k-\,i < V7fc,i 


(1 - rjk-i,i)ClCs-^rk < Vk,i- 

This completes the proof of the right hand side of (30). 

We next prove the left hand side of (30). From the spectral properties of DSk{ul), it follows 

bkiDSr{ul)u,DSj:{ul)u)<bkiu,u), k = l,2,---J. (31) 

Combining (31) and assumptions (28) and (29) gives 

-bk{{I - Pk-i)DSnul)u,DS]:{ul)u) 


r<2r’^ 


[bkiu,u) - bk{DS]:{ul)u,DSriul)u)fbk{u,uY-^ < 


1-/3 ^ 


/^2/^P 


— (2m)^■ ^ ~(2m)'® 

where we have used the following inequality (which is similar to (3.16) in [5]): 

b,(ll - DS,(u%))DSl'"(ul)]u,u) < ^bt{[I - DSl’"{ul)]u,u). 


bk{u,u), 


Let T/t = ^1 + j ^j$ J . By the induction assumption, we have 

bk-i(ll - DBk-i(fk-i)Pgk-i(ul_i)]u,u) > (1 - Tk-i)bk-i(u,u), 
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which can be^wrfitten as 


-bk-ii[I - D-Bk-i{fk-i)Dgk-i{ul_i)]u,u) < -r]k-i,2bk-i{u,u)- 


Then, from the? above mequalifles, we obtain 

-bk{[I - DBk{fk)Dgk{ul)]u,u) 

= -bt{(I - Pt,/)DSr(nl)u, DSnuDu) 

< -h((I - Pt-,)DS^(ul)u, DS^{u;)u)- r,t-MPk-iDSr{ut}u, DS^(ul)u) 

= -n-Mu - ■D5r(“:)“) - m-iMDs^i,ui)u, ds^(ui)u) 

The proof of the left hand side of (30) is completed. □ 

With Theorem 2, we now can obtain a convergence theorem of the nonlinear V-cycle. 

Theorem 3 Let {«•[} be a sequence of iterative values of the nonlinear multigrid V-cycle algo¬ 
rithm, and let ul be a solution of equation gk{u) = fk- If the assumptions in Theorem 2 hold, 
and m is sufficiently large, then there exists a constant ak with 0 < ak <1, independent of grid 
size hk, and a neighborhood 0{ul,ek) of u\, such that all u\ G 0{u\,ek), 


(2m)^ 


4- Tk-i - 1 bk{u, u) = {Tk - l)bk{u, u) = -gk, 2 bk{u, u). 


- ul\\b,k < crk\\ui - ul\\b,k i = 0,l,2, •••, 

when the initial guess ul G 0{ul,ek). Here || • \\b,k, the induced norm from bk{-,-), is defined by 
\H\lk = hiu,u). 


Proof Clearly, from Theorem 2 it follows that 

\bk{[I - DBk{fk) Dgk{uk)]u,u), ^ 


\\I-DBk{fk)Dgkiul)\\b,k = s^P 

u 


bk{u,u) 


s gk- 


For a^given positive number 5k satisfying cr^ = (Jjt -f 77 ^ < 1, the differentiability of tpk at gives 
that there exists a neighborhood of u^, 0{ul, tk) = {uk : \\uk — ul\\b,k < ^k}, such that 


Wf^kiuk) - f^k{ul) - Dil)k{u*k){uk - ul)\\b,k < h\\uk - 

where Uk G 0{ul,ek), Ck is a positive number, and is defined in (17). Thus 
W'ikiuk) - = \\f^k{uk) - f’kiuDWb^k 

< W^kiuk) - f’kiuk) - D^fk{uk){uk - Uk)\\b,k + \\Dfjk{uk){uk - ul)\\b,k 
^ {5k + |l-C^V’A;(Wfc)|j6,fc)lkfc ~ ■” ‘^k\\b,k- 

Hence, by induction, for any ul G 0{ul, e*,), we can easily show that u^ G 0{ul, Ck), arid 

hk^^ -K\\b,k < (^k\\uk -ui\\b,k i = o,i, 2 ,•••. 

□ 


801 





In a nonlinear multigrid algorithm, the following equations have been used on Mk for k <l\ 

gk{v) = /fc + ~ §k{ui)], (32) 

and 

gk{y^ — fk ^kQk\,fk+l 5^fc+l 

where is the j-th iterate of the nonlinear multigrid method, and Vi is the iterative value after 
the pre-smoothing step of the nonlinear multigrid algorithm. Hence, to ensure that a nonlinear 
multigrid algorithm is well-defined, we should show that the solution of either (32) or (33) lies 
in the neighborhood 0{ul, Ck) given in Theorem 3. 


Theorem 4 Let 0{u\,ek) he a neighborhood of u\. Assume that 

(a) There exists a constant C such that for all u € Mk l|T>p^^(tt)||6,jt < C. 

(b) The auxitiarp vmtor Uk satisfies Uk € 0(uk,'£kl2). 

(c) The auxiliary miue Sk satisfies Sk < when r ^ 0, otherwise, s* = 0. Here 


r = max{||/fc - 5ffc(«i)||6,fe, \\Qk[fk+i - S'fc+i(t'i)]||&,fc} 


, and vi is the iterative value after the pre-smoothing. 

Then, the solution of either (32) or (33) lies in the neighborhood 0{uk, e*). 


Proof. We only show that the solution of (32) lies in 0(h]t, tk)- The proof for (33) is similar. 
Set rk = fk- gkiy-k), and w = gL^ifk + SA,rfc). If r* = 0, then w = Uk € 0(ul, Ck). If r* ^ p, 
with assumptions (a) to (c), we have 

ik^^felkfc - Ibfc k/fc + - “Tlkfe 

< hkH/k + Hrk) - Ufctli,* + - “fclkfc 

- IIS '*^ih + Skn) - 9k^ifk)\\b,k + l|ufc - ul\\b,k 
S {n)'llfc,*|lt’fcjkfc + [i®fc " 

% + l|n* — < ejfe/2 + efc/2 = Ck, 


i.e. w 6 0{uk, tk). We complete the proof of Theorem 4. D 

In this section, as an application of the theory in Section 3, we consider the convergence of the 
nonlinear V-cycle for solving the second order elliptic, mildly nonlinear boundary value problem 

f - V(«Vn) + ^(®>«) = /(®)k (34) 

( u = 0, on dPl, ^ ' 

where fl is a bounded, Lipschitz, polyhedraTdomain in a £ W^’°°(n), a > Cq. > 0 a.e. on 

H, and / £ L\n). 
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Let D 2 B denote the derivative of with respect to the second variable. We make the 

following assumptions on D 2 B in this section. 

A2) D 2 B{x,u) is continuous in 0, x R, and there exist constants Ci and C 2 such that 

0 < C 2 < D2B(x,u) < Cl. 

A3) D 2 B{xyu) satisfies a Lipschitz condition: there exists a constant L, independent of u 
and V, such that 

\D 2 B{x, u) -- D 2 B{x, u)| < L\u - u|, (35) 

for all (x,u), (a;,u) on a subset of Q, x R. 

Let H — Hoifl) be the Sobolev space [2]. The weak form of (34) is thus: Find u €: H , such 
that 

a(u, v) = (/, v)l 2 , Wve H (36) 

where 

a{u,v)^ / [a\/ u'g V + B{x,u)v]dx, and (/,u)j [,2 = / f{x)v{x)dx. (37) 

»/ vCl 

Let Mk be a set of piecewise linear functions with respect to a quasi-tmiform triangulation 
Tk on n of size hk in the usual sense [8], We assume that there is a constant c , independent of 
k, such that hk~i < chk, and these triangulations should be nested in the sense that any triangle 
in d^k-^i can be written as a union of triangles of Tk- 

The finite element discretization for (36) on each Mk is as follows: Find Uk € Mk such that 

a{uk,v) = {f,v)L 2 , \/v e Mk , (38) 


where k ~ 1,2, ■ ■ ■ ,1. 

Based on Theorem 39.12 in [16], we assume that 

A4) Equations (36) and (38) have unique solutions u* and Uk, respectively. For u* € 
with /3 € (0,1], there exists a constant c, independent of hk, such that 

||w*< cfif, (39) 

where ~ 1,2, • • •,/, and \\ • \\i is the usual norm in Sobolev space [2]. 

We solve equation (38) by the nonlinear multigrid V-cycle scheme with the smoother S'^ 
defined by m steps of the damped-Jacobi-Newton iteration. To prove its convergence, using 
Theorem 3, we only iteed to verify Assumptions (29) and (28). 

We first prove Assumption (29) for the smoother S'^ below. 

Let be a natural nodal basis for Mk-, where nk = dimMk- Apparently, we may 

consider the following equation on Mk'- For fk € Mk, find Uk € Mk such that 

(5^fe(Wfc), ^u)k ~ i.fk^ ^u)k-i ^ ~ 1) 2, • • •, nk, 

with Qk being defined by 

{gk{uk)-,v)k = a{uk,v)-{f,v)i,2, Wv e Mk- (40) 
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Let uj. be the j-th iterate of the damped-Jacobi-Newton iteration using a damping parameter 
9, expressed as follows: 

+ Rk{ui)[fk - gk ( 4 )], 

where the linear operator Rk{u) : Mk —> Mk is defined by 

Rk{u)v = {v,(pi)k(pi Wv e Mk. 

i=l V J 

Since Sl{u) =u + Rk{u){fk - fffc(u)), and 

DSl(ui) = I - Rk(ul)Dgk(ul), (41) 


we have 

Dsn<) = [/ - Rk(v.l)Dgk«T = 

Clearly, DSl{u\) is symmetric, so Assumption b) of Theorem 2 holds. From (41) we see that the 
Jacobi-Newton iteration has a similar form as the damped-Jacobi method in [17]. Therefore, 
using the same argument as in [17], we can show that Assumption (29) is satisfied by the 
damped-Jacobi-Newton iteration. 

We next verify Assumption (28). Let g be defined by 

( 5 r(u), u) = a(u,u) - (/,u)i, 2 , Vu e JT. (42) 


It is easy to show that Dg{w), defined by 


{Dg{w)u,v) = j [au \/V + D 2 B(x,w)uv]dx, 

J 


Vue i?. 


is symmetric, positive definite on H. 

Hence, from (40) it follows that Dgk(w) is a symmetric, positive definite operator on Mk- 
Thus, the bilinear form on Mk x Mk 


hk{u,v) = {Dgk{w)u,v)k, Wu,v e Mk, (43) 

is symmetric, positive definite. 

For simplicity, we let Ak = Dgk{ul), and define a family of norms as follows: 

Mll,k = {^k^^'^)k, WveMk, 

where r is a positive number. In addition, we note that ||u|1oa; is equivalent to ||u|)l 2 and 

We now can show that Assumption (28) holds in the following theorem. The proof of this 
theorem can be found in [15]. 


Theorem 5 Let Mk be the space of continuous piecewise linear functions with respect to a 
quasi-uniform triangulation, and let u\ he the solution of equation gk{u) = fk Mk- Assume 
that (Al) to (Af) hold, and that the solution U of the variational problem 

bk{U,v) = {F,v)L 2 , yveH (44) 
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Figure 1: A comparison of a nonlinear V-cycle 
and a linear V-cycle. Here • • • : the linear V- 
cycle method for solving (46) with 6 = 0, +++: 
the nonlinear V-cycle method for solving (46) 
with a = 6 = 1, and h = jlg. 


Figure 2: Dependency of the convergence rate 
of the nonlinear V-cycle on the auxiliary vector. 

Here + + + : Ufc = 0, —: Uk = - 

- : Uk = 5'|°°(0), • • • : Uk = 0.5, h 
and a = 6 = 1 in (46). 


1 

128 ’ 


is in for some (3 € (0,1], and satisfies 

\\U\\h^,,<C\\F\\h,-^ (45) 

for some positive constant C, independent of F. Then, there exists a constant C such that 

13 

| 2 \ 2 


\h{(I - Pt-l)u,»)| < C hi.{u,u) 


VtieMi, 


where Xk is the largest eigenvalue of Dgk{ul). 


NUMERICAL EXPERIMENTS 


In this section, we present numerical experiments with the nonlinear multigrid method for 
solving the following model problem [10]: 


-(Uii: + Uyy) + 6sinh(-au) 

u 


= /inO = (0,l)x(0,l), 
= 0 on dCl, 


(46) 


where a and 6 are positive numbers. The right hand side term / of (46) is chosen such that 
u = sin TTX sin ny is the solution. 

The discretization equation of (46) is defined by the five-point stencil with hk = 1/2* (1 < 
k < 1). The smoothing process Shf consists of m steps of the Gauss-Seidel-Newton iteration. 
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Figure 3: The relation of the relative residual of the nonlinear V-cycle with parameter Sk at the 
12th V-cycle iteration. This figure shows that as Sk is around 1, the nonlinear V-cycle has an 
almost same convergent rate. Here and a ~ b — 1 m (46). 


We set mi — m 2 = m for all grid levels and the coarsest grid size fii = | for all of our numerical 
examples. Besides, the full-weighting restriction operator Qk, [9], was used, and only one step of 
the Gauss-Seidel-Newton iteration was applied to get the solution of the equation on the coarsest 
grid Ml. The initial guess = 0 and the relative residual stopping criterion were taken for 
all the numerical experiments, which were implemented on a KSRl supercomputer with single 
precision, which is equal to the regular double precision. 

We compared the performance of the nonlinear V-cycle with the linear V-cycle method. The 
linear V-cycle case was obtained from the nonlinear V-cycle program by setting 6 = 0 in (46). 
Thus, a Poisson equation was solved by the linear V-cycle method. From Figure 1 we see that 
the nonlinear multigrid method is as efficient as the linear multigrid method. We checked the 
dependency of the convergence rate of the nonlinear multigrid method on its two parameters u 
and Sk- We used three different values of Uk in the experiments. 

1) Mfc — 0 on all grid levels; 

2) Uk — S^{0), i.e. Uk is defined by m steps of the Gauss-Seidel-Newton iteration with zero 
initial guess. Clearly, by increasing m, we can make Uk approach to the exact solution gk{u) = fk 
as closely as desired. 

3) Uk = QkUk+i^ where Uk^^ denotes the iterative value after the pre-smoothing step of the 
V-cycle. We call this type of Uk Brandt’s choice because it was first used by Brandt in [7]. 
Figure 2 shows that if Uk is properly close to the solution of gk fk^ the convergence rate of 
the V-cycle will be almost the same. Otherwise, the nonlinear V-cycle may be divergent. For 
example, from this figure we see that the V-cycle with Uk = 0.5 was divergent. 

For fixed Uk = 0, we also made experiments with different values of Sk. Figure 3 shows that 
it is satisfactory to let Sk be around 1. 

Finally, we checked the influence of the a and h in (46) on the convergence of the nonlinear 
V-cycle method. The numerical results are reported in Tables 1 to 3. Here we used four different 
Uk, h = and mi = m 2 — 1 for all of these numerical experiments. We also used a = 1.0, 
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6 i= 1.0 and a = 3.0 in Table 1, Table 2 and Table 3, respectively. The notation — in the 
tables means that the V-cycle is divergent. From these tables we see that: 1) When 0 < a < 3 
and 0 ^ 6 < 10, {tfc = 0 is the simplest choice; 2) Brandt’s choice worked for 0 < a < 6 and 
0 < 6 < 100; and 3) the nonlinear V-cycle with Uk = ‘S'™{0) using large m can lead to conver¬ 
gence for a pair of a and b for which the nonlinear V-cycle with Brandt’s choice is divergent. 

Table hThe performance of the nonlinear V-cycle as the b in (46) becomes larger. 



The Total number of Iterations 

b 

Ufc = 0 

fife — 

Uk = Slid) 

u, = s;“(o) 

10 

13 

14 

13 

14 

30 

40 

13 

14 

13 

100 

— 

12 

35 

13 


Table 2: The performance of the nonlinear V-cyde as the a in (46) becomes larger. 


a 

The number of Iterations 

Uk - 0 

Uk = Qkuj;ii 

fi* = sm 

Uk = (0) 

0.001 

14 

14 

14 

14 

2.0 

13 

14 

14 

14 

3.0 

32 

14 

14 

15 

6.0 

— 

12 

— 

30 

7.0 

— 

— 

— 

20 


Table 3: The performance of the nonlinear V-cycle for solving (46) with large a and b. 


b 

The number of Iterations 

Ufc = 0 

Uk — QkUk+i 

Uk = Slid) 

a, = s;»( 0 ) 

0.01 

14 

14 

14 

14 

1.0 

32 

14 

14 

15 

20.0 

■— 

12 

— 

16 
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SUMMARY 

A highly accurate and efficient numerical method is developed for modeling 3-D reacting 
flows with detailed chemistry. A contravariant velocity-based governing system is developed for 
general curvilinear coordinates to maintain simplicity of the continuity equation and compactness 
of the discretization stencil. A fully-implicit backward Euler technique and a third-order monotone 
upwind-biased scheme on a staggered grid are used for the respective temporal and spatial terms. 
An efficient semi-coarsening multigrid method based on line-distributive relaxation is used as the 
flow solver. The species equations are solved in a fully coupled way and the chemical reaction source 
terms are treated implicitly. Example results are shown for a 3-D gas turbine combustor with strong 
swirling inflows. 


INTRODUCTION 

Combustion simulation generally requires the solution of the coupled equations of mass, momen¬ 
tum, species balance and energy with detailed thermodynamic and transport relations and finite-rate 
chemistry. In order to alleviate the strong interaction between the flow and combustion, and to avoid 
solving this huge system at the same time, the governing equations are usually solved in a semi- 
coupled way that the chemical reaction part and fluid flow part are treated separately. For the flow 
part, the mass, momentum and energy equations can be solved by using the existing CFD code; 
therefore, most efforts towards modeling combustion are concentrated on the reaction part. Many 
progresses have been made in solving the chemical species equations [1-8]. 

It is well realized that the reaction part, that involves multi-species, multi-step, finite rate kinetics, 
is a sensitive and stiff system, and it takes most of CPU time in most computations. Most of the 
successful combustion simulations are based on the coupled solution of chemical reaction system. 
There has not been found a general efficient way to decouple the system and reduce the cost in each 
iteration. Therefore the most effective approach is to reduce the iteration number. Since the flow 
field acts as the carrier of chemical reaction, it can be anticipated that a fast established flow field 
will provide a stable base for the reactions and therefore make the species equations easy to converge. 
As shown in our previous work [9,7,8], very efficient CFD methods will greatly reduce the iteration 
numbers of the reaction part which is very costly. Furthermore, for practical 3-D combustion, the 
flow field may be very complex, then the flow part conld take considerable portion of the total CPU 
time. Therefore, the development of very efficient CFD methods and reaction modeling method is 
equally important in combustion simulations. 

This paper describes a very accurate and efficient numerical method we have developed for 
calculating general 3-D reacting flows with detailed chemistry. The principal focus is put on the 
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development of a high eflBcient solution method and high accurate scheme for chemical species 
transport equations. Based on the finite volume frame, an implicit method is developed to solve 
the 3-D Navier-Stokes equations and chemical species transport equations in general curvilinear 
coordinates. A distinctive feature of this method is that the contravariant velocities are employed 
as the dependent variables. The momentum equations of contravariant velocities are discretized in 
staggered control volumes while the energy equations and species equations are integrated basically 
by using a cell-centered finite volume scheme. In this way, the discretized mass equation remains its 
simple form as in the Cartesian grids and the stencil is spatially the most compact. A third-order 
monotone upwind-biased scheme by van Leer [10,11] is used for all the convection terms of flow 
equations and species equations to minimize numerical diffusion and maintain the sharp gradients 
present in flames. 

This method was tested by applying to calculate the strong swirling combustion in a 3-D gas 
turbine combustor. For a 49x65x65 grid of 207,025 grid points, the calculation takes only about 200 
time steps and 21.3 CRAY-YMP hours to reduce residuals by more than three orders of magnitude 
for all governing equations. 


GOVERNING EQUATIONS 


The governing equations for general compressible reacting flows in integration form can be sum¬ 
marized as follows. 


Mass conservation: 


Momentum conservation: 


f + [ pq- fids = 0 

Jci ot Jy 


j J p^q-n)ds = j f^ds 


( 1 ) 


( 2 ) 


In low speed combustion, the kinetic energy is negligible comparing with enthalpy; therefore, the 
energy conservation can be simplified as [12]: 


[ f ph{q-fi)ds= f f^-qds+ f Aft(Vh • n)ds 

Ja Jr Jr Jr 


Chemical species equation: 

dpY, 


f ^^dQ + I pYo:{q-n)ds= [ Ay(Vy„ n)ds+ / 

Jn ot Jy Jr Ja 

a = 1,2, ■ ■■,NS, 


(3) 


(4) 


Enthalpy and state equations: 

A = h(Y„,T), p = 


( 5 ) 


where t is time, 0 is a fixed control volume with boundary F, p is density, p is pressure, q is 
velocity vector, T is the temperature, h is the enthalpy, n is the unit outer normal vector of the 
boundary, tv, is the total viscous stress acted on a surface with outer normal vector n, and Ra is 
the chemical reaction rate of species a. R, Ya, and Wa are the gas constant, the mass fraction and 
molecular weight of species a, respectively, and the specific enthalpy and species diffusion coefficients 
are determined from 

A - (+ JJI- 
^Pri ' PrT J ' ^ \ScL ScT, 


A,= (^+ 


$10 











where is molecular viscosity, ht the turbulent viscosity determined from turbulence model, Prz, 
and PrT are the laminar and turbulent Prandtl numbers, and Scl and Sct are the laminar and 
turbulent Schmidt numbers, respectively. From the constitutive relations, we have; 

[r] = -(p+^^V-g-)[/]+2A<[£] (6) 


e 




dqi dqj 
dxj dxi 


= Me + f^T 


(7) 

( 8 ) 


The enthalpy h and molecular viscosity p can be calculated by the following formulas: 


h = 

a 


ha 

= [ Cp^dTa = ho„ + / Cp^dTa, 

Jo JTq 


Cp. 

= C°p^ + C}>^T + C]>^T'^ + + Cj>^T^, 


Me 

— ^ ] YaMa, 


Ma 

= pI + mIt + mIt^+mIt^ + mIt^- 

(9) 


where ho„ is the standard formation enthalpy of ath species, Cp ^> C'K’ ‘’ ^Po,’ Mar " 
polynomial coefficients for Cp^ and fXa, respectively. 

All thermal and transport parameters are obtained by linking with CHEMKIN-II [13] standard 
libraries. 


CHEMICAL REACTION MODEL 

For laminar flames, the chemical reaction rate Ra for the ath species can be calculated by 



where is the molecular weight of species a, Np is the total number of reaction steps. Ns is the 
total number of species, refers to the stoichiometric coefficient of products (reactants), and 

n, = ^. 

* UJ\ 

The function Kj (K^) is the rate constant for the forward (backward) reaction step j. We 
assume Kj has the following Arrhenius temperature dependent form: 

kJ ^ AjTp exp{-^), (11) 

and Kj has a similar expression. The reverse rate constant can be written in terms of the forward 
rate constant and the equilibrium constant Kj as 

K^ = A'/ IKl (12) 

Here, Kj are also obtained by calling CHEMKIN-II. The pre-exponential factor Aj, the temperature 
exponent aj, and the activation energy Ej can be compiled from published experimental work. 

For turbulent reacting flows, the Algebraic Correlation Closure(ACC) model is used to introduce 
a correction term to the reaction rate [7,8]. 
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CONTRAVARIANT VELOCITIES AND STAGGERED GRID 


One may think use of contravariant velocity on staggered grid will result messy governing equa¬ 
tions and cause great difficulties in coding. However, that is not always true. Following are the 
reasons why we choose it to solve reacting flows on arbitrary grid: 

9 Using staggered grid can result more accurate and robust schemes as concluded by numerical 
analysis and confirmed in previous calculations on regular Cartesian grids. 

» On general curvilinear grids, staggered grid method can be made of best use by combining with 
contravariant velocities. For each contravariant velocity component, the discretization stencil 
for its main direction pressure gradient is spatially the most compact, therefore eliminating 
the possibility of odd-even decoupling of pressure. 

« The use of the contravariant velocity also benefits the solution of mass, energy and chemical 
species equations. The flow convection can be accurately represented. 

• With use of proper discretization method and careful selection of definition locations of vari¬ 
ables, the governing equations can be kept simple enough for the momentum equations, and 
even simpler for all scalar conservation equations. 

• Most importantly, this method will retain the close relation between mass flux and pressure 
difference on curvilinear grids. Therefore the pressure-correction method can be used very 
efficiently. This feature yields a fast convergence rate on curvilinear grids which is similar to 
that on Cartesian grids. 

Let (u,v,w) be the velocity components in Cartesian coordinates (x,y,z), and {U, V, W) be the 
contravariant velocity under computational coordinates (^, t],Q', their relations can be described as: 

U = -t- v^y H- 'I 

V — J(uT}^ -I- vTjy + wrjz) > (13) 

W -J (<* + vCy +wC) ) 

where J is the transformation Jacobian from (x,y,z) to (^,»?, C)- 

From the above relations, the velocity components in x,y,z direction can be found: 

■ u 1 r U 1 r .76 76 ]■' 

V =A V , A= Jrja; jT)y Jt)z (14) 

w j L ^ J L 76 76 J 

Equation 14 will be frequently used hereafter; for simplicity it is denoted as: 

qi = a,mU^ (15) 

where [qi,q 2 , qsV = 'v, w]'^ and = [U, V, Wf . 

In this work, the basic scheme is the finite volume method. The computational domain is dis¬ 
cretized into a number of quadrilateral cells in two dimensions or hexahedral cells in three dimensions. 
As in Fig. 1, 1-2-3-4-5-6-7-8 forms a typical cell in three dimensional problems. In finite volume 
formulation, the contravariant velocities can be expressed as: 

^i+^,j,k - (q- Sje 78 )i+ij,h 

^i,j + ^,k i.q ' “^ZSTs); J.|-i J; (1®) 

+ i = (? •5'3487)ij,i + i 

where subscripts i,j,k denote the cell index in each of the three curvilinear coordinate directions, 
respectively. In order to retain the merit of staggered grid, the contravariant velocities are defined 
at different locations as shown in Eqn. 16 and Fig. 1. 
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Figure 1; Cell Locations in Three Dimensional Grid 


Generally the face vectors are denoted as following for clarity: 

52 = (52^,52!/, 

In the finite volume frame, equation 13 and equation 14 are expressed as: 

V = + vS^-y + \ 

w = + vS^y + J 


■ 5 ^"^ 

Sly 


S2x 

S2y 

S2z 

S3x 

533 / 

S3z 


(17) 


(18) 


(19) 


In the actual computation, pU, pV, and pW are regarded as the dependent variables instead 
of U, V, and W, because they are conserved quantities and the resulting governing equations are 
relatively simple. Their definition locations are the same as those of U, V, and W. pU is defined at 
(*+1; h is defined at {i,j + ^,k), and pW is defined at (i, j, fc+1). All other variables, p, p, h, 

and I'd, are defined at the cell centers. Only p, pU, pV, pW, h, and Yq are the dependent variables 
which are solved directly from the integral conservation equations (T4). All other parameters are 
determined from the relations (5-10). 

The governing equations for contravariant velocities can be established through coordinate trans¬ 
formation, then their forms are indeed quite complicated. Actually we can find an easy way to 
obtain the equations by applying the momentum equation to certain control volumes. For example, 
the equation for j can be obtained by simply multiplying the Eqn. 2 with the face vector 

1 j k’ applied to control volume j., which is formed by connecting ^-line mid-points 

a-b-c-d-h-e-f-g as shown in Fig. 1 

/ g/ ^ dQ+ f p{q-S^){q-n)ds = f 5^ • f„ds (20) 
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Notice = Sj^i j. is a constant vector within the control volume Qp = j In the above, 

all q will be eventually expressed in terms of U, V and W by using Eqns. (18,19). We prefer to do 
the transformation later in the succeeding sections, because it will be much easier to do that after 
discretization. 

The momentum equations for pV and pW can be obtained in the similar way by applying to 
jj. and j respectively. 

All the other equations, i.e., mass conservation, energy conservation and species equations, are 
applied to control volume Volij^k- They can be put in a general form: 

f + f p(f>{q-n)ds= f A(V0-n)ds+ f Fds+ f Sds (21) 

Jn Jr Jr Jr Jn 

where (j) = [1, h, E = [0,7^ • g, 0]”^ and S — [0,0, Ra]'^ with a = 1,2, ■ • •, NS. 

The above equations are not their final forms; the Cartesian velocity q is still used for simplicity. 
It will be replaced by contravariant velocity during the discretization process in the next two sections. 

STAGGERED FINITE VOLUME SCHEME 


In this section, we begin to discretize the governing equations described in the last section. 


Momentum Equations 


The p{7-equation (20) is applied to the staggered control volume j which is discretized by 

using finite volume method as 


Voi 


»+5iJA' 


ipu): 


n+l 




At 


+ • (P?)f](9 • S)i = Vispp 


1=1 


( 22 ) 


where I is the cell surface index, ranges all the 6 cell surfaces of the control volume Vol^^i j k- 
is the total viscous stress component in j|. direction acted on the surface of control volume 

Voh.i ,• j.. It will be described in the next section. 

Based on the idea of MUSCL scheme by Van Leer [10,11], a partially upwind-biased scheme is 
developed to approximate the momentum fiuxes through cell surfaces. The basic idea is that the fiux 
through the control volume surface is regarded as the product of the mass flow and the conserved 
quantity. According to the sign of mass flux, the conserved quantity is set to its upwind-side value. 
Thanks to the staggered scheme, the mass flux through the surfaces is always directly available. 
There are only two possible locations for all the control volume surfaces, either the surface lies along 
with one of the original grid surfaces or it runs through the original grid cell center. In the former 
case, the mass flux is already defined there. In the latter case, since the Cartesian velocity and density 
are defined at the cell center, the mass flow also can be found straightforwardly. Therefore only the 
conserved quantity at the surface is needed to be interpolated or obtained through reconstruction 
of data from the cell-averaged values like Van Leer’s MUSCL method. This feature ensures that the 
calculated flux is continuous when mass flow changes sign. For example, if the flux (F) through a 
control volume surface (S) in i direction is consisted of mass flow (M) and the conserved quantity 
(V>), then 

Fi^iM-S)i ^i = (M-4+ V>i(+) (23) 


In the above, the superscripts 
variable, respectively. 


on a variable denote the positive and negative part of the 


= max{M,0), M~ = min{M,0) 


( 24 ) 



and the superscripts (+),(-) on an index indicate that the variable is taking the limit value on the 
interface from the left or the right, respectively. For instance, in i direction we have: 


V>i(-) = limV>i, ipn+) = limip} (25) 

,l<i. ,l>i. 

High-resolution schemes up to third order can be constructed by setting 

</> 

V'i(-) = V’i-1/2 d-+ (1 + (26) 

V’i(+) = ’/'i+1/2-+ (1 — '<^)^]V’i+l/2 (27) 

where V and A are backward and forward difference operators, and k is a parameter used to control 
the order of the scheme, k = (1/3) is used in the present method to construct the third-order 

scheme. When /c = — 1 the scheme reduces to the second-order fully upwind method. The limiter cr 

is adopted to ensure the monotone interpolation following Koren [14] as: 


^ -4-61 _ 

‘^'-5- “ 2(VV'-,_i - AT/>,_i )2 -t- 3VV’t-iAi/>i_i 4- e 


(28) 


where 9, a small constant with a typical value of 10 is added to prevent division by zero. 

In our solution algorithm, only ,(pC/)i_Lj,fc, (pt7).^i_^.+j j,, (pt7),.+ i j,, {pU)i^Lj^k+i 

{pU)i+ij^k-i} {pU)i+Lj,kJ Pi,j,k and Pi+i,j,h are treated implicitly for p(7-equation. In general, the 
pt7-equation can be expressed in 6 form as: 

AjB5(pC/)i+|j j, 4- AwS{pU)i_L j k + ANS{pU)i+Lj+i^h + ^sS{pU)i+ij_i^k 
+ AF6ipU)i^L j ,,+i + ABS{pU)i+i j k-i + Ac6{pU)i^x j k 

4- A^Spij^k +j j, (29) 


where Ru denotes the residual of p[/-equation, including convection and diffusion part. 
Similarly, the momentum equations of pV and pW can be found. 


Scalar Conservation Equations 


All the scalar conservation equations (21) are applied to control volume Volij^k with cell-centered 
finite volume scheme. The above-used upwind-biased scheme with limiter are used for the convection 
terms, second order compact central difference scheme for the diffusion terms. The only exception 
is the mass conservation equation, which benefits most from the staggered grid, the discretized 
equation has the simplest form and is the most compact in space in terms of contravariant flux 
velocity 


where 


S{pU)i+y^k - KpU)i-U,k + 

+ <5(pW0ij-,fc+i - = -Rrnij^k 


p« + l _ o'! 


(30) 


(31) 


In our solution method the time-dependent term of mass equation is dropped for fast convergence. 
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All other equations are discussed here in their general form (21) except for the source term and 
the stress work term. The source terms of the species equations are usually dominant and of strong 
non-linearity. We will discuss the treatment of those source terms in the next sub-section. The stress 
work term in the energy equation will be discussed in the next section, since it has no contribution 
to the implicit coefficients. If we leave the implicit coefficients contributed by the source terms in 
the next sub-section, the discretized forms of Eqn.(21) are assumed to have the following form: 

+ ^NHi,j+l,k + ^sS<f)i,j-l,k /oq'v 

+^FS(l>i,j,k+i + ^BS4>i,j,k-i + ^cHij.k = -i2es(^6) 

where Res{(j)) is the residual of (^-equation. 

The convection term is discretized by using the same method described in last sub-section for 
convection terms of momentum equations. The diffusion term on the right side of Eqn.(21) is 
discretized through two steps. First we calculate the gradient on the cell surface by applying 
Gauss’s formula to locally-formed staggered control volume, then assemble the integration. Since 
the gradients are computed locally, the resulted scheme reduces to a compact one when regular grid 
is used. 


Implicit Treatment of Reaction Source Term 


The major difficulty in calculation of finite rate combustion is the stiffness of the species equations. 
To solve this problem, the source terms (production rate of chemical reaction) must be treated 
implicitly. 

In the last subsection, the discretization of time dependent, convection and diffusion terms of the 
general scalar conservation equation is discussed. For the chemical species equations, the discretized 
equations can be written as: 


= - [CriYo^r " ^t(Y„)” - R„] 


(33) 


where 5( ) = ( )"+i — ( )", Ct is the convection term and Dt the diffusion term. Ra is the reaction 
rate defined in Eqn.(lO) 

Nr Ns Ns 

Ra = (34) 


m=l 


1=1 


;=i 


Wi 


The reaction rate is usually very large and dominant near the flame front. Therefore, implicit 
treatment for the production rate term is necessary. Using Taylor expansion, we have 

Rl+^ = Rl + J2^6Y^+J20(SY^). (35) 


By defining 


R^{Ri,R2,---,RNsf, 6Y = {6YuSY2,---,6YNsf, D, 


dRa 

dY’ 


we may have 


w R*" + D5Y. 


(36) 

(37) 


where D is a As by Ns matrix. 

It is apparent that the implicit treatment of Ra requires the coupled solution of the species 
equations. By denoting ReSa = C't(Yc[)" — D’r{Ya)'^ — R2, the residual of ath species equation and 
Res = {Resi, Res 2 , • • ■, Res^s)^, the residual vector, then Eqn.(33) becomes 


$£;l6Yj + l -I- $VvKY,_l -f $ivI5Y,'j+l_J; -f $sKYij_l,J; 

-|-$FKYi -I- -I- ($cl + D)5Yiy,i = -Res 
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where I is a unit matrix with the elements 


r 0 ifl^m 

\ 1 if/=.m 


(39) 


and $ is a scalar. 

Eqn.(38) is the final form of the species equations. They are solved in a coupled way. If line- 
relaxation is used along j-line and Gauss-Seidel iteration used in i, k directions, for instance, then 
the equation (38) is rewritten as 


+ (#cl + 

= -Res - ^ElSYf^\j,, - 


(40) 


The left side of above equation forms a block-tridiagonal system, which can be solved by using 
the tailor-made algorithm combined with a Gauss Elimination method for the small block matrix 
inversion. 


VISCOUS STRESS 


Generally the viscous stress acted on a surface S — {Sjc, Sy, S^} with outer normal if = ^ is defined 


tVj S* — TjjS'j; Tz ^Z 

— ^ij'xx^x T Tyx^y "b TzxSz) T ji^'^xy^x T '^yy^y “b UjyiS^) -b k(^TxzSx “b TyzSy -b Tzz^z^ (^^) 

where i,j, k is the unit vector in x, y, z direction, respectively. 

The viscous force component in direction acted on S surface can be obtained by multiplication 
of the above equation with Hu ■ 


(r^l * Tlu')S - 7luxi,'^xx ^X "b TyxSy "b '^ZX^z') "b U^y ('T’jjy iS^; “b Fyy iSy “b T’zySz') 

“b '^UZ^'^XZ^X ”b X'yzSy -b '^ZZ^z'} 


From Eqn.(5), we have 


Rm 


h 


L -(P+ 3 b‘V-^) + 2/i^ l = m 

where = Wj.fc 


Blm 

-p + IB 


2 pmm 
3^ 


l ^ m 
I = m 


(42) 


(43) 


In the finite volume formulation, velocity strain can be calculated as: 


du^ 


dx" 


ij,h 


Vol, 


i,j,k 




7,fc 




J cSm 


I c3m\ 


. J c3m \ 




(44) 


Hereafter, all subscripts, except those indicating grid location, are placed upper-right like super¬ 
scripts to avoid confusing with cell index; 

After substituting the above equation into Eqn.(42), we have: 

6‘' 


(f„-n„)5 = nC*^''"5''= 1 


riim cyl 01 

5 S - 6 pS 




(45) 


By introducing difference operator 

^(( )i,j,k = ^l( )i,j,k = ( )j+ij,i; — ( )j_l 

)i,j,k — ^2( )i,j,k — ( )j,j + i,fc — ( 

= ^3( )i,j,k = ( )i,j,k + k - ( )ij,ib-i 


(46) 
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and using Eqn.(15), S*”* then can be expressed in terms of contravariant velocity as: 




(47) 


The viscous term in Eqn.( 22) can be obtained by applying the above equation to each surface 
of the control volume j-j., after substituting in the above equation with ^ 

Vis{U) — j, 

+( 5 ' 'in,3)»+i,j,J:+i ~ {S Tr,,3),-4.1 


+ 1 


6'” 




Similarly, we can find the viscous terms in V- and tT-equations. 
In the energy equation, the viscous stress work is: 


J^fn-qds = {(S'lfn,, + ' 3)i.j + i,J; 

= S„ipq^S^"')ij,k 

/ xJm \ 

+ - -y-j + 62(S""g”*52')i,,-,*, + 6siB'"'q"^S^%,k (49) 


SOLUTION PROCEDURE 


To solve the governing equations discretized in foregoing sections, an implicit time-marching method 
has been developed. The governing equations are divided into two sets: the flow part and the 
chemical reaction part. They are solved alternately. Different solving techniques are applied to 
those two sets of equations. In the following, the numerical procedure is described in detail. 


Provision of Reaction Mechanism 


For a given combustion problem, the chemical reaction mechanism is needed to be prescribed besides 
the fuel, oxidizer and boundary conditions. The chemical reaction mechanism is usually obtained 
through experiment. In the numerical simulation, it is represented by the pre-exponential factor 
44^, the temperature exponent aj and the activation energy Sj of the chemical reaction equations. 
Those parameters and reaction equations are specified through an input data file “mech” provided by 
users in our code. In our test case involved methane-air reaction, the Ci-chain reaction mechanism 
in Table 1 given by Xu [5] is adopted, in which 16 species are involved in 45 steps reaction chain. 

Thermal and transport parameters are obtained by calling CHEMKIN-II subroutines and data 
bases. 
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Table 1. Ci-Chain Methane-Air Reaction Mechanism. Rate coefficients: 

K = AT“exp{—^), units: moles, cubic centimeters, seconds, Kelvins, calories 


No. 

reaction 

A 

a 

E 

1 

CH 3 + H^ CHi 

1.90E4-36 

-7. 

9050. 

2 

CH 4 . + O 2 ^ CH 3 + HO 2 

7.90E4-13 

0. 

56000. 

3 

CH 4 + H^ CH 3 + H 2 

2.20E4-4 

3. 

8750. 

4 

CH 4 + 0^ CH 3 + OH 

1.60E4-6 

2.36 

7400. 

5 

CH 4 + OH^ CH 3 + H 2 O 

1.60E4-6 

2.1 

2460. 

6 

CH 2 O + OH^ HCO 4- H 2 O 

7.53E4-12 

0. 

167. 

7 

CH 2 O + H^ HCO + H 2 

3.31E4-14 

0. 

10500. 

8 

CH 2 O + M^ HCO + H + M 

3.31E4-16 

0. 

81000. 

9 

CH20 + 0^ HCO + OH 

1.81E4-13 

0. 

3082. 

10 

HCO + OH^CO + H 2 O 

5.00E4-12 

0. 

0. 

11 

HCO + M^H + CO + M 

1.60E4-14 

0. 

14700. 

12 

HCO + H^C0 + H 2 

4.00E4-13 

0. 

0. 

13 

HCO + O^OH + CO 

1.00E4-13 

0. 

0. 

14 

HCO + 02^ HO 2 + CO 

3.00E4-12 

0. 

0. 

15 

CO + O + M^ CO 2 + M 

3.20E-H13 

0. 

-4200. 

16 

CO + OH^ CO 2 + H 

1.51E4-7 

1.3 

-758. 

17 

CO+ 02 ^ CO 2 + 0 

1.60E4-13 

0. 

41000. 

18 

CH 3 4- O 2 ^ CH 3 O 4- 0 

7.00E4-12 

0. 

25652. 

19 

CH 3 O + M^ CH 2 O + H + M 

2.40E4-13 

0. 

28812. 

20 

CH 3 O + H^ CH 2 O + H 2 

2.00E4-13 

0. 

0. 

21 

CH 3 O + OH^ CH 2 O + H 2 O 

1.00E4-13 

0. 

0. 

22 

CH 3 O + 0^ CH 2 O 4- OH 

1.00E4-13 

0. 

0. 

23 

CH 3 O 4- O 2 ^ CH 2 O 4- HO 2 

6.30E4-10 

0. 

2600. 

24 

CH 3 + 02 ^ CH 2 O + OH 

5.20E4-13 

0. 

34574. 

25 

CH 3 + 0^ CH 2 O 4- H 

6.80E4-13 

0. 

0. 

26 

CH 3 + OH^ CH 2 O + H 2 

7.50E4-12 

0. 

0. 

27 

HO 2 + CO^ CO 2 4- OH 

5.80E4-13 

0. 

22934. 

28 

H 2 ~\~ O 2 ^ ^OH 

1.70E4-13 

0. 

47780. 

29 

OH+ H 2 ^ H 2 O 4- H 

1.17E-h9 

1.3 

3626. 

30 

H + 02 ^ 0 H + 0 

2.20E4-14 

0. 

16800. 

31 

0 + H2^0H + H 

1.80E4-10 

1. 

8826. 

32 

H + 02 + M^ HO 2 + 

2.10E4-18 

-1. 

0. 

33 

H + O 2 "f" O 2 ^ HO 2 "f" O 2 

6.70E4-19 

-1.42 

0. 

34 

H + 02 + N 2 ^ HO 2 4- N 2 

6.70E4-19 

-1.42 

0. 

35 

OH + HO 2 ^ H 2 O + O 2 

5.00E4-13 

0. 

1000. 

36 

H 4- HO 2 ^ 20H 

2.50E4-14 

0. 

1900. 

37 

0 4- HO 2 ^02 +OH 

4.80E4-13 

0. 

1000. 

38 

20 H ^0 + H 2 O 

6.00E-b8 

1.3 

0. 

39 

H 2 + M^H + H + M’’ 

2.23E4-12 

0.5 

92600. 

40 

O 2 4" Af ^ 0 -|- 0 + M 

1.85E4-11 

0.5 

95560. 

41 

H + 0H + M^H20 + M 

7.50E4-23 

-2.6 

0. 

42 

H 4- HO 2 ^H 2 + 02 

2.50E4-13 

0. 

700. 

43 

HO 2 + HO 2 ^ H 2 O 2 4- O 2 

2.00E4-12 

0. 

0. 

44 

H 2 O 2 + M ^OH + OH + M 

1.30E4-17 

0. 

45500. 

45 

H 2 O 2 + OH^ H 2 O + HO 2 

1.00E4-13 

0. 

1800. 


Third body efficiency with respect to Ar: 

“ H 2 O = 21, H 2 = 3.3, CO = 2.0, CO 2 = 5.0, As = O 2 = 0. 
^A20 = 6,F = 2,A2 = 3 
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Starting Estimate 


The governing system is highly nonlinear and its solution requires a good starting estimate. 
Similar to the work by Xu et al [5], we use a solution of infinitely fast combustion [9] as our initial 
guess. In the infinitely fast kinetics, the fuel and the oxidizer are separated by a thin exothermic 
reaction zone. In this zone the fuel and oxidizer are in stoichiometric proportion and the temperature 
and products of combustion are maximized. This iiifinitely fast reaction solution not only provides 
a good initial guess, but also helps overcome the difficulty of ignition with finite-rate combustion. 


Solution Method 


A fully implicit time-stepping scheme is developed. In the laminar case, the system consists of 
21 equations (if there are 16 species). In the turbulent case there will be 23 equations. They are 
solved in groups: 

(a) pU, pV,pW and p by solving the mass and momentum equations 

(b) k, e, pt by solving the turbulence model in turbulent combustion case 

(c) h, Ya by solving the energy and species equations 
and, finally, updating 

(d) p,p by calling CHEMIKIN-II 

For the flow part, a line-distribution updating scheme [9,15] is used. To further accelerate the 
convergence, a semi-coarsening multigrid method is developed. Here we only point out the techniques 
we used for our specific applications. In our method, the density and pressure are defined at the cell 
centers and the contravariant velocities are defined at cell interfaces. The density and pressure are 
transferred from finer level by area weighting to coarser grid; the contravariant velocities in coarser 
grid are simply set to the sum of those at corresponding interfaces. The residuals on finer grid are 
restricted to coarser by adding up the corresponding part to the staggered stencils. After relaxation 
is completed on coarser grid, the corrections are fed back to finer grid by bilinear interpolation. 

For the reaction part, the energy equation is solved together with the species equations. An 
implicit alternate line-relaxation method is used for the energy equation. The species equations 
are treated in a fully coupled way. The reaction source terms, which are non-linear and usually 
troublesome, are treated implicitly by linearization. The block-line tridiagonal solver combined with 
vectorized pivoting Gauss elimination is used, which was found very effective to handle the sensitivity 
and stiffness of the system. 

The multigrid method is used only for momentum and continuity equations in this work. The 
other equations, such as energy equation, species equations and k, e equations, are solved on a single 
grid. Therefore, we cannot achieve full multigrid efficiency. However, the whole process for solving 
our system is still substantially accelerated. 


BOUNDARY CONDITIONS 


The boundary type usually encountered can be classified as inflow, outflow, solid wall, symmet¬ 
rical (slip) and periodical. At the inflow boundary, the flow velocity, enthalpy, and chemical species 
are specified, but the pressure is extrapolated from the interior; then the density is found herefrom 
by using the state equation. 

For the outflow boundary, the back pressure is prescribed and other variables are extrapolated 
from the interior. 

For solid wall boundary, since ghost cell is always introduced, both slip (symmetrical) and non¬ 
slip conditions can be easily implemented with use of contravariant velocities. Take example of wall 
condition on a j = constant plane. For non-slip condition, reverse reflection is applied to all the 
contravariant velocities associated with the ghost cell. For slip (symmetrical) boundary, the reverse 


82 © 



reflection is only applied to V, direct reflection is applied U and V. In both cases, the contravariant 
velocity V lies on this j — constant plane is always set to zero. 

The periodical boundary is the simplest. All the values on ghost cell are taken directly from the 
corresponding cell of other side. 

All the boundary conditions are treated fully implicitly through modiflcation of the implicit 
coefficients of the discretized equations at the boundary points. 


NUMERICAL RESULTS 


This method was applied to calculate the strong swirling combustion in a 3-D gas turbine combustor. 
The computational conditions and grid information are summarized in Table 2. 

Table 2 Strong Swirling Combustion in a 3-D Model Combustor 

Table 2.1 Working Conditions 


Inflow Speed 

Fuel 

Oxidizer 

Species Number 

Reaction Steps 

0.0988(average), 30° swirling angle 

Methane 

Air 

16 

45 (Table 1) 


Table 2.2 Summary of CPU Time and Convergence on Different Grids 


Grid 

Iteration Number 

Convergence 

CPU Time- 

Machine 

Cold Flow 

Fast Reaction 

Finite Rate 

49x21x21 

(21,609) 

10 

30 

120 

5.17 orders 

1.77h 

Cray-YMP 

53x29x29 

(44,573) 

10 

30 

120 

3.61 orders 

3.57h 

u 

Cray-YMP 

49x65x65 

(207,025) 

20 

30 

200 

3.30 orders 

21.3h 

Cray-YMP 


The test case shown here is strong swirling combustion in a 3-D gas turbine combustor. Figure 2 
shows the inlet velocity vectors; the fuel and air enter the combustor coaxially with strong circulation. 
Figure 3 shows the calculated temperature isotherms on the center plane. The velocity vectors are 
plotted in Figure 4. The distributions of main chemical species CHa, O 2 , CO%, H 2 O and CO are 
presented in form of isopleths in Figures 5-9. A total of 160 time steps are used for this computation, 
including 10 steps for cold flow, 30 steps for fast reaction and 120 steps for the detailed finite-rate 
reaction. During each iteration step, 2 V-multigrid-cycles are performed for the flow part and 2 
iterations for combustion part. For a 49x65x65 grid of 207,025 grid points, the calculation takes 
only about 200 time steps for finite rate calculation and 21.3 CRAY-YMP hours to reduce residuals 
by three orders of magnitude for all governing equations, demonstrating the high efficiency and 
capability of the present method. 
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Figure 2: Velocity vectors at the inlet Figure 4: Vector plots of the flow fleld On the 

center {x, y)-plane (laminar) 




Figure 3: Temperature isotherms on the center Figure 5: CH 4 isopleths (mass fraction) on the 
(^i 2 /)-plane .center (r, 2 /)-plane 
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Figure 6: O 2 isopleths (mass fraction) on the cen- Figure 8: H 2 O isopleths (mass fraction) on the 
ter (a;, 2 /)-plane center (s, 2 /)-plane 



Figure 7: CO 2 isopleths (mass fraction) on the Figure 9: CO isopleths (mass fraction) on the 
center (x,y)-plane center (a;,y)-plane 
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